Blake Matheny

RSS

Collins - Infrastructure Management for Engineers

At Tumblr we strive to automate as much as is reasonable. Automation helps us manage thousands of servers, our MySQL topology, our software deployments, our configuration updates, A/B testing, etc. As your production environment grows it generally becomes less and less consistent and more and more difficult to manage, even with tools like Puppet and Chef. Eventually you need a central point of truth from which you can determine the state of any given asset in your environment.

When we started building a second production datacenter last year we needed a way to manage the intake process for the thousands of servers that would be getting shipped to us. Although we built Collins to support our secondary datacenter, we deployed it to the legacy production environment in February 2012. At that point we already had roughly 1500 production servers yet we had no consistent view of the environment. The initial problem we set out to solve was simply to inventory the production environment and get a sense of what servers were in use, which servers weren’t, and where there might be cost savings. After completing the inventory process, Collins quickly found another use in helping automate the management of our MySQL topology via Jetpants.

Pretty quickly most of our infrastructure was using or populating Collins data in some way: puppet, func, the deployment tool, host provisioning, graphing/trending, proxy configuration, nagios configuration, DNS configuration, etc. Today an engineer at Tumblr can login to Collins using their LDAP credentials, find an available host, click Provision and be on their new dev box in under an hour. This is actually part of the developer on-boarding process.

In our recipes document you can find some sample use cases for Collins like:

Today we are open sourcing Collins, the Tumblr infrastructure management system. Collins was developed using the play framework but was designed so that people without any Java/Scala experience could integrate with it using the API or via Callbacks. Here at Tumblr we have bash, python and ruby integrations with Collins, all developed by different people. There is also a PHP SDK for Collins in the works.

We are releasing the following components, available under the Apache License v2.0:

The Documentation is available under the Creative Commons BY 3.0 license.

Collins can be integrated with the Jetpants MySQL management toolkit through an open source plugin called jetpants_collins. This plugin allows Jetpants to use Collins as the single point of truth for your hardware inventory, automatically querying the list of pools, shards, hosts, and database instances in your infrastructure. Furthermore, every change you make to your MySQL topology using Jetpants (master promotions, shard splits, cloning replicas, etc) will be reflected in Collins immediately and automatically.

Over the next month we will also be open sourcing a number of other related components which you can find out more about here and here.

In the meantime, here are some more links to get you started:

A number of people were responsible for helping make Collins so successful at Tumblr. Big thanks to Dan Simon, Steve Salevan, Joshua Hoffman, Dallas Marlow, Brad McDuffie, Evan Elias and all the rest of the Tumblr engineering team. Additionally a number of companies helped beta test Collins and provide feedback, thanks to all of you!

Oh, and if you’re Interested in this type of work, we’re hiring!