Due the heavy use of MapReduce in the SCAPE project the resource management capability of YARN, the next generation of the Apache Hadoop MapReduce framework, has been investigated in detail.
YARN is "Yet-Another-Resource-Negotiator". As the major advancement beyond the plain MapReduce framework this new release contains a cluster-wide Resource Manager component. As the name suggests this component provides for management of computational resources available in a cluster environment. Furthermore it is responsible for job scheduling and monitoring. In the former version of Hadoop MapReduce these two functions were served by the Job Tracker whose resource management capabilities were confined to the tracking of free and busy worker nodes.
On the contrary the new Resource Manager offers more differentiated functionality in this respect. Applications may require various allocations of diverse computional resources such as memory, cpu, disk, network etc. and request them from the Resource Manager by defining a Container. Given those specific requirements the Resource Manager can decide which cluster resources to reserve and where to enqueue them within a cluster-wide schedule. Scheduling policies are also configurable through the Resource Manager's Scheduler component. Hence resources can be applied more effectively for an application's specific needs.
Furthermore the framework is no longer strictly interlaced with the MapReduce programming paradigm. MapReduce v2 has become a type of distributed application deployable on a YARN driven cluster besides other ones which may implement different parallelization models. The MapReduce version for YARN is compatible with the previous stable release 0.20.205.
YARN, MapReduce v2 and HDFS were installed on an Eucalyptus machine image (emi-F64A14C5) on the cluster infrastructure of AIT using Cloudera 4 and deployed on three nodes (one master = Resource Manager, two slaves). Two example applications have been executed successfully:
A non-MapReduce application simply executing a shell command or script on each node.
The classic word count example counting words in files of a input directory on HDFS.
Existing MapReduce applications of the SCAPE project (eg. the toolwrapper) could not be tested for API incompatibility between the employed Hadoop version 0.20.203 and MapReduce v2. Furthermore the YARN framework is not recommended for production use at the current state of development and resource management is only supported for memory.
However due to the heavy use of Hadoop MapReduce in the SCAPE project it is highly recommended to track further development of YARN and consider its employment in future.
MapReduce v2 vs. YARN:http://blog.cloudera.com/blog/2012/10/mr2-and-yarn-briefly-explained/
Cloudera Installation Guide:https://ccp.cloudera.com/display/CDH4DOC/CDH4+Installation#CDH4Installation-Step3%3AInstallCDH4withYARN