Wrapping Existing Tools
In the following, we list requirements and restrictions for preservation tools to be executed on the Scape Platform. Please note, these restrictions apply only to preservation tools that are intended to be run on the Platform's cluster environment. They do not apply to tools that are deployed in other application contexts (for example graphical rendering environments, standalone tool services, etc.).
- No graphical environments (e.g. XServer) supported
- Execution of Java-based tools and wrapped Linux binaries (must be preinstalled)
- Execution on Windows platforms may be possible but will not be investigated in year one
- Data can be directly passed to an application using streams (stdin/stdout pipes)
- File pointer are also supported but may require the generation of a local temp. files on the cluster except pointers to HDFS/HBASE).
- Tool configuration shall be provides using templates/profiles (developed within TB/TCC)
- Tools must provide an XML specification (called toolspec) that describes how they are invoked by the wrapper. This can include a default behavior, different patterns, and supported parameters
- The idea of the toolspec is to support different tool wrappers and runtime environments (like standalone, Hadoop, PACT).
- The initial version of the cluster will be based on Apache Hadoop (including MapReduce, HDFS)
- Generic wrappers for executing preservation tools as MapReduce jobs will be provided
- Cluster applications must be executed via the command line (in year one)
- A web service for orchestrating the cluster via the Taverna workbench will be developed (in year two)
- The cluster supports the passing of input data based on references only. A number of data sources (HDFS, Hbase, local fs, ...) that are local to the cluster will be supported. Methods to populate the data sources will be provided. Data staging within an orchestrated workflow will however not be supported.