Skip to end of metadata
Go to start of metadata


First we watched the first half of Mapreduce & HDFS (PDF Slides)

See also:

The exercises use The Cloudera Hadoop Demo VM and The Cloudera Training Material.

Exercise 1 - Getting Familiar with Hadoop

  • Get the VMWare or VirtualBox disk image from Andy.
  • Fire it up. (username/pw cloudera/cloudera).
  • Double click on 'Link to Getting Familiar with Hadoop'.
  • Follow the instructions (although please skip the 'update exercises' bit).
  • NOTE that HADOOP_HOME should be set to /usr/lib/hadoop - this is not set by default!

Exercise 2 - Running a Map Reduce Job

Exercise 3 - Advanced notions

Here's a few ideas for more advanced things to do.

  • Fire up Firefox and have a look at the Hue, HBase Master, NameNode Status and JobTracker Status pages in the bookmarks bar.
  • Modify the code from exercise 2 to parse the MIME types from the sample web crawler log supplied in ~/scape/sample.log and produce a format profile.
  • Generate a sequence file, perhaps using forqlift, and do something clever like run DROID 6 on it.

So why do we need HBase?

We need something like HBase because HDFS does not cope well with lots of 'small' files (due to the HDFS block size). See for information and some alternative solutions.

Enter labels to add to this page:
Please wait 
Looking for a label? Just start typing.