Blake Matheny

Month

July 2011

15 posts

“Ephemeral nodes are useful when you want to implement [tbd]” —The Zookeeper Documentation, indicating that ephemeral nodes are useful for something, something, something, dark side. (Bonus family guy quote is free)
Jul 13, 201111 notes
#documentation
All That Is Interesting: The First Zombie-Proof House → all-that-is-interesting.com

all-thats-interesting:


Somehow, ritual drunk-conversation concerning team captains for the apocalypse has become a major part of the lives of 20-somethings. Having been matured in the Grandaddy-crowned masterpiece film (put “A.M. 180” on and forget that you have a job) 28 Days Later and the best-selling …

Jul 12, 20111,696 notes
#zombies
Jul 8, 20113,293 notes
“Zuckerberg today: “Humans don’t understand exponential growth. If you fold a paper 50 times it goes to the moon and back.” Actually, it goes to the Sun, I guess he made his point.” —http://www.wolframalpha.com/input/?i=log2+%28%28distance+from+earth+to+sun+%29%2F+%28thickness+of+copy+paper%29%29
Jul 6, 20118 notes
Jul 4, 201120 notes
#proposal #shesaidyes

June 2011

13 posts

“My parents used to fly my brother and me down to visit my grandparents every summer. When I was 7, my mom had carpal tunnel surgery during one of those summers. My dad picked us up from the airport and explained during the three hour drive home that something had gone wrong and they’d had to amputate her hand and now she has a baboon hand. He kept it up the whole drive; little things like ‘There’s a lot of hair, but don’t mention it, she’s sensitive about it.’, ‘She’s still not used to how strong it is yet, it might hurt a little when she hugs you.’” —Reddit: What’s the best you’ve ever been trolled IRL?
Jun 28, 201162 notes
Jun 26, 20118 notes
#art
Sorting Maps by Value in Java

I was recently working on a problem where I had a Map which contained essentially String keys and Integer values. The data was being output by a Hadoop mapper and I wanted my reducer to only output the top 50 values per day. I needed to sort my map by the integer value and then grab only the top 50 items. The method I used to grab the top 50 was kind of gross (in my mind anyhow) so I wanted to share it and see what other ways people might solve this problem. Below is the code:

PriorityQueue<Map.Entry<String,Integer>> q = new PriorityQueue<Map.Entry<String,Integer>>(
    50,
    new Comparator<Map.Entry<String, Integer>>() {
        @Override
        public int compare(Entry<String, Integer> e1, Entry<String, Integer> e2) {
            return e2.getValue().compareTo(e1.getValue());
        }
    });

I then iterated over my map and shoved all the entries into the priority queue. When I iterate over the queue, I get elements back out sorted. So, here are some issues I see with this approach.

  • The memory requirement is 2n.
  • The runtime for insertion is roughly O(n*log(n)) (iterate over each item in the map, insertion into the queue)
  • The PriorityQueue implementation is unbounded (but I actually only want 50 items in it)

Is there a better (more efficient) way to do this without changing the original map?

Jun 24, 201118 notes
#java #question #software
Cassandra vs MongoDB vs CouchDB vs Redis vs Riak vs HBase vs Membase vs Neo4j comparison → kkovacs.eu

Very high level overview of a bunch of graph/key-value/column-family/etc stores.

Jun 24, 201113 notes
#technology #nosql
Remembering Alan Turing on his 99th birthday → thinq.co.uk

Alan Mathison Turing, was an English mathematician, logician, cryptanalyst and computer scientist. He was highly influential in the development of computer science, providing a formalisation of the concepts of “algorithm” and “computation” with the Turing machine, which played a significant role in the creation of the modern computer. Turing is widely considered to be the father of computer science and artificial intelligence.

Jun 23, 201124 notes
#turing #birthday
Do you travel? What Country you like best? what continent you like best/

I love to travel. I’ve slept in 45 of the 50 states and been to 19 countries. This past year I made it to Jamaica, Puerto Rico, and Hawaii (not a country, but a beautiful destination). So far my favorite country has probably been the US (just because the landscape is so diverse), but outside the US is Austria as Vienna is just an amazingly beautiful city. Not sure what my favorite continent is but Europe is easy (and safe) to travel in.

Jun 20, 20112 notes
Picking Up Scala

I’ve been evaluating a couple of queue systems recently (Kafka from LinkedIn and Kestrel from Twitter) that are both written in Scala. In evaluating the systems I had the need to become more than conversationally familiar with the language. When I lived in Indianapolis my company hosted the Scala meetups and I occasionally went and as such did some reading on the subject, but I didn’t actually code anything with it. My friend JR, organizer of the Indianapolis Scala meetup, recommended I check out the Scala for Java Refugees series to get started. The series is fantastic but was written back in ~2008 and isn’t entirely consistent with the latest and greatest from scala 2.9. Below are a list of my notes regarding things that I had to change from scala 2.9.

  • The Application trait is deprecated, use App
  • Using the java Math constants is deprecated in favor of scala.Math

Well, that’s it. I started this post on the first installment of the series and assumed it would be riddled with issues but in fact there were very few. Only took a couple of hours to get through including pounding out the examples. I hope to wrap up my ‘learn a new programming language’ exercises this weekend so I can get through some of the Kafka/Kestrel code this week.

    Jun 19, 201113 notes
    #scala #programming #technology #software
    Using a Combiner with Hadoop Jobs

    If you’ve used Hadoop before you probably know that it’s strongly recommended to use a  combiner if your workflow/data will allow for it. The standard convention is to just re-purpose your reducer as your combiner, however that doesn’t work particularly well when you’re operating on an aggregate form in the reducer. The canonical example is computing a mean. If you’re computing the mean in your reducer, you don’t want to do that in your combiner as well as you’ll be calculating the mean of a subset.

    I recently implemented a statistical job without using a combiner. The job typically took 15-20 minutes to run. Below are the Hadoop stats for it.

    Notice the substantial filesystem activity. 10 billion bytes written, 8 billion bytes read. Also, 2 billion reduce shuffle bytes.

    I got tired of waiting on this job so refactored it to use a combiner. After the combiner was introduced the job time went from an average of 17 minutes down to an average of 3 minutes. The filesystem stats and shuffle stats were also reduced significantly. Below are the Hadoop stats.

    The combiner made a tremendous difference and reduced the time for this job by 82%. Just thought I’d share some real world numbers that might be useful to someone struggling with job time. If you don’t have a combiner, implement one.

    Jun 16, 201111 notes
    #hadoop #technology
    Jun 13, 201113 notes
    #nerdpron #office #desk #technology
    Spotify – Large Scale, Low Latency, P2P Music-on-Demand Streaming → csc.kth.se

    derekg:

    I read quite a few boring papers about various nerdy topics but this paper is not boring - it’s AMAZING.  I was already a fan of Spotify but after reading this, I am sort of blown away. A few tidbits that I didn’t know - 

    • Uses Peer-to-Peer to distribute music
    • Using Ogg Vorbis for encoding
    • Song data comes from 8.8% servers, 35.8% p2p, 55.4% cache
    • 88% of access were against 12% of the catalog
    • 60% of the catalog was access at least once a week
    • All TCP based 
    • No NAT Traversal but wide use of UPnP on home routers

    The whole paper is packed with a ton of insights. Spotify is an impressive engineering feat. 

    Two additional things that I walked away with that I thought were particularly cool.

    • Spotify uses a markov model for throughput estimations. The client doesn’t start playing until all simulations pass (e.g. indicate the track will play without a buffer underrun)
    • Neighbor selection (which peers to stay connected to) is performed based on a heuristic which determines peer utility. Peer utility is based on things like; bytes sent over time, other peers found through searches via connection, and number of tracks the peer has that the client has been interested in.
    Jun 9, 201177 notes
    #p2p #technology #spotify
    “overheard: The problem with a UDP joke is that you have no idea if people got it.” —Twitter / @Jeremy Zawodny: overheard: The problem wit … (via pukomuko)
    Jun 7, 201127 notes
    The one where Blake figures he's doing something wrong

    image

    Jun 6, 20119 notes
    #r #technology #programming
    “Go offers interesting language features, which also allow for a concise and standardized notation. The compilers for this language are still immature, which reflects in both performance and binary sizes.” —Robert Hundt from Google recently published “Loop Recognition in C++/Java/Go/Scala”, a comparative benchmark of the four languages. The quote kind of reflects a “Yeah Go is cool, but slow and chubby” attitude which is interesting considering Go came out of Google. C++ was the fastest by a wide margin but as the author noted, required extensive tuning.
    Jun 3, 201113 notes
    #technology #programming #google

    May 2011

    12 posts

    github I Love You But You're Bringing Me Down

    In roughly 2003 I was working on the Linux kernel for a research project at Purdue. I didn’t have internet access at home at the time and wanted to be able to take work between my lab on campus and my office. The USB card reader I had didn’t work on Linux at the time so I created a small patch for the kernel to support it. Then I submitted it, and waited, and waited, and waited. The USB devices maintainer was quick to acknowledge, but then it was months before it made its way to an official kernel release and years before it made its way to most distributions. This cycle time wasn’t specific to Linux either, and I found this type workflow pretty common for open source projects.

    Contrast this ‘old style’ workflow with what git and github has allowed us to do. Recently I submitted patches to hiredis and webdis for two unrelated issues. In both cases the authors acknowledged the pull request on the same day. The webdis author integrated the change the same day, and the hiredis author worked with me to make some changes to the patch before accepting it less than a week later. Now the code is out there for anyone to use and it didn’t really take me any effort at all. On top of that, github makes it so trivial to fork that if I hadn’t gotten a response I could have just forked the project and maintained it myself. Github would allow that fork to be visible for the world to see and use.

    Until github came about, open source projects were relegated primarily to the confines of sourceforge and freshmeat. There seems to be this revival of open source software, largely due to how easy to use github is, that doesn’t require ‘participating in open source’. Github did something brilliant which was to take the open source out of open source. These days it’s just what you do, it’s not a decision to be made or something to be discussed.

    Github has made my open source, vim, Linux, windowmaker elitist attitude a thing of the past. And yeah, that kind of brings me down.

    May 31, 201176 notes
    #github #technology #opensource
    May 25, 2011168 notes
    #comedy #technology
    Next page →
    2012 2013
    • January 17
    • February 8
    • March 17
    • April 18
    • May 8
    • June 6
    • July
    • August
    • September
    • October
    • November
    • December
    2011 2012 2013
    • January 2
    • February 7
    • March 6
    • April 9
    • May 13
    • June 14
    • July 12
    • August 8
    • September 17
    • October 9
    • November 7
    • December 7
    2010 2011 2012
    • January
    • February
    • March 2
    • April 14
    • May 12
    • June 13
    • July 15
    • August 7
    • September 13
    • October 4
    • November 3
    • December 2
    2009 2010 2011
    • January
    • February
    • March 1
    • April 1
    • May
    • June 13
    • July 1
    • August
    • September
    • October
    • November
    • December
    2008 2009 2010
    • January
    • February
    • March
    • April
    • May
    • June
    • July
    • August
    • September
    • October
    • November
    • December
    2008 2009
    • January
    • February
    • March
    • April
    • May
    • June
    • July
    • August
    • September 7
    • October 2
    • November
    • December 1