Hadoop Summit: Facebook creates business intelligence tool called Hive

Hive was developed iteratively by a 2 or 3 person team (I think Jeff Hammerbacher was also involved) making it easy for business analysts to ask ad hoc questions of terabytes worth of logfile data by abstracting MapReduce into a SQL like dialect. Think of it as a data warehouse sitting on top of thousands of servers’ logfiles. Beneath the surface Hive leverages Hadoop and translates SQL-like imperatives into MapReduce jobs.


I like seeing SQL like dialects put on top of MapReduce operations. I’m working on my own… WesQL, j/k. :)

Hive is in use by ~40 people or ~25% of FaceBook’s engineering team (thus FaceBook’s engineering team size is 40*4 = 160). It stores a total of 22TB of compressed data, with ~200G daily increase.


Hive and it’s query language reminds me of WebQL except that it lacks strict MapReduce. Update: This model is similar to DryadLINQ “treats the data flow as a general graph instead of forcing it into map/reduce.” from parand.com.

the ql2 studio showing a graph of webql statement joins

Leave a Reply