Need a cheap MapReduce? Amazon EC2 and Hadoop is your answer.

It’s time to re-examine those long running batch jobs. Could you partition the data to allow for MapReduce? I bet you can. I know I’ve always wanted an affordable way to fire up 30 servers and run MapReduce operations against giant datasets, it’s confirmed; I’m a dork.

Tom White sent me a note this week to inform me that he had implemented a Hadoop file system on top of S3. This file system can be used as a full or partial replacement for HDFS, the Hadoop Distributed File System.

Because bandwidth between EC2 instances and data stored in S3 is not metered or billed, this is a very cost-effective way to process large amounts of data.

Hadoop Filesystem Using S3

One Response to “Need a cheap MapReduce? Amazon EC2 and Hadoop is your answer.”

  1. I want to use this.

Leave a Reply