Blake Matheny

RSS

Posts tagged with "scala"

Tumblr Firehose - The Gory Details

Back in December I started putting some thought into the tumblr firehose. While the initial launch was covered here, and the business stuff surrounding it was covered by places like techcrunch and AllThingsD, not much has been said about the technical details.

First, some back story. I knew in December that a product need for the firehose was upcoming and had simultaneously been spending a fair amount of time thinking about the general tumblr activity stream. In particular I had been toying quite a bit with trying to figure out a reasonable real-time processing model that would work in a heterogenous environment like the one at Tumblr. I had also been quite closely following some of the exciting work being done at LinkedIn by Jay Kreps and others on Kafka and Databus, by Eric Sammer from Cloudera on Flume, and by Nathan Marz from Twitter on Storm.

I had talked with some of the engineers at twitter about their firehose and knew some of the challenges they had overcome in scaling it. I spent some time reading their fantastic documentation and after reviewing some of these systems came up with the system I actually wanted to build, much of it completely influenced by the great work being done by other people. My ‘ideal’ firehose, from the consumer/client side, had the following properties:

  • Usable via curl
  • Allows a client to ‘rewind’ the stream in case of missed events or maintenance
  • If a client disconnects, they should pick up the stream where they left off
  • Client concurrency/parallelism, e.g. multiple consumers getting unique views of the stream
  • Near real-time is good enough (sub 1s from an event emitted to consumed)

From an event emitter (or producer) perspective, we simply wanted an elastic backend that could grow and shrink based on latency and persistence requirements.

What we ended up with accomplishes all of these goals and ended up being fairly simple to implement. We took the best of many worlds (a bit of kafka, a bit of finagle, some flume influences) and created the whole thing in about 10 days. The internal name for this system is Parmesan which is both a cheese as well as an arrested development character (Gene Parmesan, PI).

The system is comprised of 4 primary components.

  • A ZooKeeper cluster, used for coordinating Kafka as well as stream checkpoints
  • Kafka, which is used for message persistence and distribution
  • A thrift process, written with scala/finagle, which the tumblr application talks to
  • An HTTP process, written with scala/finagle, which consumers talk to

The Tumblr application makes a Thrift RPC call containing event data to parmesan. These RPC calls take about 5ms on average, and the client will retry unless it gets a success message back. Parmesan batches these events and uses Kafka to persist them to disk every 100ms. This functionality is all handled by the thrift side of the parmesan application. We also implemented a very simple custom message serialization format so that parmesan could completely avoid any kind of message serialization/deserialization overhead. This had a dramatic impact on GC time (the serialization change wasn’t made until it was needed) which in turn had a significant impact on average connection latency.

On the client side, any standard HTTP client works and requires (besides a username and password) an application ID and an optional offset. The offset is used for determining where in the stream to start reading from, and is specified either as Oldest (7 days ago), Newest (from right now), or an offset in seconds from the current time in UTC. Up to 16 clients with the same application ID can connect, each viewing a unique partition of the activity stream. Stream partitioning allows you to parallelize your consumption without seeing duplicates. This is a great feature for instance if you took your app down for maintenance and want to quickly catch back up in the stream.

Kafka doesn’t easily (natively) support this style of rewinding so we just persist stream offsets to ZooKeeper. That is, periodically clients with a specific application ID will say, “Hey, at this unixtime I saw a message which had this internal Kafka offset”. By periodically persisting this data to Kafka, we can ‘fake’ this rewind functionality in a way that is useful, but imprecise (we basically have to estimate where in the Kafka log to start reading from).

We use 4 ‘queue class’ (tumblr speak for a box with 72GB of RAM and 2 mirrored disks) machines, capable of supporting roughly 100k messages per second each, to support the entire stream. Those 4 machines provide a message backlog of 1 week, allowing clients to drop into the stream anywhere in the past week.

As I mentioned on twitter, I’m quite proud of the software and the team behind it. Many thanks to Derek, Danielle and Wiktor for help and feedback.

If you’re interested in this kind of distributed systems work, we’re hiring.

coderspiel:

Building Network Services with Finagle and Ostrich (by Tumblr’s Blake Matheny for ny-scala)

Scala group eh? Bridgeport knows what’s up.

Scala group eh? Bridgeport knows what’s up.

Sep 2

Scala is for Drivers

Totally agree with the authors sentiments about Scala vs Java and the perception that each has.

Callbacks, synchronous and asynchronous

coderspiel:

In the end it works, but imag­ine what hap­pens if callback-based APIs become pop­u­lar and every jar you use with a call­back in its API has to have its own thread pool. Kind of sucks. That’s prob­a­bly why Netty punts on the issue. Too hard to make pol­icy deci­sions about this in a low-level net­work­ing library.

/cc Waffle, who said some of this before.

Picking Up Scala

I’ve been evaluating a couple of queue systems recently (Kafka from LinkedIn and Kestrel from Twitter) that are both written in Scala. In evaluating the systems I had the need to become more than conversationally familiar with the language. When I lived in Indianapolis my company hosted the Scala meetups and I occasionally went and as such did some reading on the subject, but I didn’t actually code anything with it. My friend JR, organizer of the Indianapolis Scala meetup, recommended I check out the Scala for Java Refugees series to get started. The series is fantastic but was written back in ~2008 and isn’t entirely consistent with the latest and greatest from scala 2.9. Below are a list of my notes regarding things that I had to change from scala 2.9.

  • The Application trait is deprecated, use App
  • Using the java Math constants is deprecated in favor of scala.Math

Well, that’s it. I started this post on the first installment of the series and assumed it would be riddled with issues but in fact there were very few. Only took a couple of hours to get through including pounding out the examples. I hope to wrap up my ‘learn a new programming language' exercises this weekend so I can get through some of the Kafka/Kestrel code this week.