The SCAPE preservation platform is widely based on the large scale distribution MapReduce infrastructure. Namely on the Apache open-source implementation and its revised and patched distribution maintained by Cloudera in its version CDH3. Why the Cloudera distribution is used, is mainly because they keep their distribution patched with a set of patches solving various bugs or performance/security improvements, that are available before a major Apache release. Last but not least, they maintain a very good documentation on the installing and maintenance procedures necessary for this software stack.
In the meantime, the current versions distributed in the CDH3 update 2 are:
- Hadoop 0.20.2
- HBase 0.90.4
- Zookeeper 3.3.3
In the first phase the Cloudera Hadoop and its MapReduce framework will be used in the Platform, so that part of the stack is essential to install. Later on, other components might be added to the platform (e.g. Oozie - a MapReduce workflow manager).
The IMF recommends installation on one of the Linux distributions for which Cloudera provides a packaged installation (e.g., Debian squeeze, SuSE, RedHat). The instructions to follow can be found at CDH3 Installation. The out-of-the-box settings can be used for initial experimentation.
If you are interested in installing HBase - that is considered to be promising candidate functioning as a base for the SCAPE Repository, you can follow the instructions at HBase installation and Zookeeper installation. Again, the default parameters should suffice for initial testing and deployment.
The SCAPE related tools and software can be downloaded afterwards from the SCAPE Repository.