Hive was developed iteratively by a 2 or 3 person team (I think Jeff Hammerbacher was also involved) making it easy for business analysts to ask ad hoc questions of terabytes worth of logfile data by abstracting MapReduce into a SQL like dialect. Think of it as a data warehouse sitting on top of thousands of serversâ€™ logfiles. Beneath the surface Hive leverages Hadoop and translates SQL-like imperatives into MapReduce jobs.
I like seeing SQL like dialects put on top of MapReduce operations. I’m working on my own… WesQL, j/k.
Hive is in use by ~40 people or ~25% of FaceBookâ€™s engineering team (thus FaceBookâ€™s engineering team size is 40*4 = 160). It stores a total of 22TB of compressed data, with ~200G daily increase.
Hive and it’s query language reminds me of WebQL except that it lacks strict MapReduce. Update: This model is similar to DryadLINQ “treats the data flow as a general graph instead of forcing it into map/reduce.” from parand.com.