Blake Matheny

Oct 3

ID Generation at Scale

Most companies that operate at a large enough scale run into the problem of efficiently generating ID’s. At tumblr this problem was solved with a simple libevent based HTTP service that essentially generates ID’s fast enough to meet our current demand and handles failure/startup by grabbing the MAX(id) (plus a large cushion). This approach has a duplicate id/mutli-datacenter problem, but we’ll deal with that soon enough.

Instagram recently wrote an interesting piece on their approach to ID generation. The author looked at Twitter’s snowflake (which I haven’t always been kind to) and discussed why it wasn’t the right fit for their problem. The author also went into a fair amount of detail about the ID generation scheme and why they made specific implementation choices.

Although I personally don’t like the approach Instagram took, I do appreciate their engineering in the open style post and the insight it provides into their infrastructure. I also like that the team looked at snowflake which I think highlights something that Twitter has been great at but hasn’t gotten much credit for. Twitter puts their stuff out on Github, whether it’s usable by the masses or not. Although this means you can’t always take their stuff and just use it, it gives people like the Instagram team ideas on how people are solving problems. I think operating at scale and in the open, like Instagram/Twitter/etc are doing, is a credit to their engineering team and company culture. I wish we saw this kind of stuff from companies like Google & Amazon.