The SCAPE Execution Platform provides a massively parallel environment for executing and orchestrating preservation tools and workflows. The SCAPE Central Instance (CI) provides a shared infrastructure for deploying the Platform software components as well as other project outcomes. Following shared infrastructures are presently being set up:
- A Hadoop cluster with pre-installed PT-tools and testing corpora at IMF (aka The Public Instance).
- An Infrastructure as a Service (IaaS) environment at AIT (~20 cores, based on XEN/Eucalyptus).
The Platform Concept Release (M14) comprises two shared data processing clusters hosted at IMF and AIT. Both clusters are based on Apache Hadoop. The setup of the IMF infrastructure hosting the Central Instance differs from the development infrastructure hosted by AIT. Most notably, the Central Instance at the IMF data center is hosted using low consumption nodes and no visualization layer is introduced. AIT provides a development cluster that is hosted on top of a virtualized private cloud environment. Both deployments presently support MapReduce, HDFS, HBase, and Hoop. Preservation tools can be installed on-demand. A Fedora Commons-based repository is presently being added to the AIT infrastructure.
Software and Downloads
The SCAPE Platform basically comprises of (1) an integrated system that extend the execution platform with a number of software components developed within the projects, and (2) tools, scripts, and applications that enable users to execute preservation workflows on to of this environment.
- A tool wrapper that allows a user to easily execute command-line applications as mapReduce jobs has been developed and made available here: pt-mapred-demo.