This page documents SCAPE system requirements. Ideally it would be good to break these down into sections (the "type" column is an initial attempt to provide some level of categorisation!)
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119.
|SCAPE Components SHALL be registered with the SCAPE Component Catalogue and provide – beside semantic descriptions – a specification on their context dependencies (required operating system, package dependencies)||Components, Component Catalogue||D5.2||Minor reword from D5.2|
|Platform (or parallel) components MUST execute as MapReduce applications on the cluster (which is the responsibility of the component developer)!||Execution Platform||D5.2|
|Complex (or composite) workflows (i.e. involving multiple MapReduce applications) MUST be implemented as parallel workflows.||Workflows||D5.2|
| The workflow language MUST be supported by the Job Submission Service.
|| Job Submission Service
||D5.2||Split from previous requirement|
|Platform components and their dependencies (like wrapped preservation tools) MUST be deployed on the execution platform prior to their registration with the platform’s Application Registry.||Components, Execution Platform, Deployment||D5.2|
|Deployed platform components MUST be registered with the platform’s Application Registry developed in the context of PT.WP2 Application Provisioning.||Components, Application Registry||D5.2|
|It MUST be possible to identify the components available on the platform from a client who wishes to execute them via the Job Submission Service. This functionality SHOULD be provided by the Application Registry developed in the context of PT.WP2 Application Provisioning.||Platform Application Registry||D5.2|
|There MUST be a mechanism to resolve the identifier assigned by Platform Application registry using the corresponding component ID assigned by the SCAPE Component Catalogue (if not the same).||Platform Application Registry||D5.2|
| The Platform Application Registry SHOULD take tool dependencies into account, reveal and/or validate them.
|| Platform Application Registry
||D5.2||"Should take tool dependencies into account" means what? For what purpose?|
|There MUST be a defined procedure for cluster administrators to register and unregister platform components.||Platform Application Registry||D5.2|
| There MUST be a procedure for users to browse the platform application registry.
|| Platform Application Registry
| Besides the component identifier, the registry MUST provide sufficient information for a client to configure and execute the component on the platform using the JSS.
||Platform Application Registry||D5.2||Split from previous requirement|
|The identification mechanism used by the application registry MUST be applicable to composite applications (i.e. workflows).|| Platform Application Registry
|Services (like the Data Connector API) MUST provide a client component (e.g. in the form of an executable Java archive) that can be added to a JSS workflow as a sequential pre/post processing activity. This component should for example support the transfer of a set of SCAPE Digital Objects to a configurable HDFS location. The client component should be configurable based on command-line parameters.||D5.2|
|It is the application developer’s responsibility to organize the output data of a calculation (say a file characterization) in a way that it can be loaded into a desired data sink (say a SCAPE digital object repository) using the provided data transfer client libraries (e.g. supporting to ingest of METS records into the repository). This can be achieved, for example, by implementing a MapReduce application that generates METS records (as output) using the SCAPE Digital Object Model (Java API) and the SCAPE METSFileFormat for Hadoop.||D5.2|
| In the context of SCAPE, there is a significant difference between the role of a “component developer” ensuring that a particular tool or piece of functionality (e.g. ImageMagick convert) is available on the parallel execution environment and the role of an “application/workflow developer” that implements a scenario/use-case on the platform based on the available components. It is the responsibility of the workflow developer to ensure the integration of the various components used in the workflow (like data source, data cleaning, processing, and data sink components).In the context of SCAPE, there is a significant difference between the role of a “component developer” ensuring that a particular tool or piece of functionality (e.g. ImageMagick convert) is available on the parallel execution environment and the role of an
“application/workflow developer” that implements a scenario/use-case on the platform based on the available components. It is the responsibility of the workflow developer to ensure the integration of the various components used in the workflow (like data source, data cleaning, processing, and data sink components).
|The JSS must provide a job specification language that is capable to enact applications developed for the different Hadoop data analytic frameworks/languages utilized in SCAPE.||Job Submission Service||D5.2|
|The JSS must also be able to enact sequential programs like those required to export data sets from a digital object repository.||Job Submission Service||D5.2|
|The JSS must be integrated with the Platform Application Registry allowing a user to select an application based on its SCAPE Component identifier.||Job Submission Service||D5.2|
|The JSS must provide means to specify workflows that are composed from multiple SCAPE Components and other (sequential) applications running on the cluster.||Job Submission Service||D5.2|
|The JSS must be implemented as a REST-based service that provides the functionality described in the Platform Architecture (SCAPE deliverable D4.1).||Job Submission Service||D5.2|