Cornell University

The Web Lab Hadoop Cluster: Documentation

Documentation on Hadoop:

  1. The main Hadoop web site. http://hadoop.apache.org/core/
  2. The official MapReduced tutorial. http://hadoop.apache.org/core/docs/current/mapred_tutorial.html
  3. Hadoop classes API (0.17.2). http://hadoop.apache.org/core/docs/r0.17.2/api/

Many of the concepts of Hadoop are derived from two papers from Google:

  1. Jeffrey Dean and Sanjay Ghemawat, MapReduce: Simplified Data Processing on Large Clusters. Usenix SDI '04, 2004. http://www.usenix.org/events/osdi04/tech/full_papers/dean/dean.pdf
  2. Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung, The Google File System. 19th ACM Symposium on Operating Systems Principles, October 2003. http://doi.acm.org/10.1145/945445.945450.

Last revised: September 2008