Skip to end of metadata
Go to start of metadata

Introduction

First we watched the first half of Mapreduce & HDFS (PDF Slides)

See also:

The exercises use The Cloudera Hadoop Demo VM and The Cloudera Training Material.

Exercise 1 - Getting Familiar with Hadoop

  • Get the VMWare or VirtualBox disk image from Andy.
  • Fire it up. (username/pw cloudera/cloudera).
  • Double click on 'Link to Getting Familiar with Hadoop'.
  • Follow the instructions (although please skip the 'update exercises' bit).
  • NOTE that HADOOP_HOME should be set to /usr/lib/hadoop - this is not set by default!

Exercise 2 - Running a Map Reduce Job

Exercise 3 - Advanced notions

Here's a few ideas for more advanced things to do.

  • Fire up Firefox and have a look at the Hue, HBase Master, NameNode Status and JobTracker Status pages in the bookmarks bar.
  • Modify the code from exercise 2 to parse the MIME types from the sample web crawler log supplied in ~/scape/sample.log and produce a format profile.
  • Generate a sequence file, perhaps using forqlift, and do something clever like run DROID 6 on it.

So why do we need HBase?

We need something like HBase because HDFS does not cope well with lots of 'small' files (due to the HDFS block size). See http://www.cloudera.com/blog/2009/02/the-small-files-problem/ for information and some alternative solutions.

Labels:
None
Enter labels to add to this page:
Please wait 
Looking for a label? Just start typing.