This page provides a glossary of terms and abbreviations used throughout the SCAPE project. The status of each term/abbreviation provides the level of agreement in that term's definition.
Anyone is free to add terms (try to keep an alphabetical ordering please), definitions (for review) or comments. These will all be reviewed within TCC calls.
Term | Abbreviation | Definition | Status | Comments |
---|---|---|---|---|
Action Service | An action service is a type of a digital preservation service that performs some kind of action on a digital object, e.g. migrating the object to a new file format. | |||
Apache Hadoop | Framework for processing large data sets on a computer cluster. See http://hadoop.apache.org![]() |
|||
Apache Pig |
A high-level language for creating workflows that run on top of Hadoop/MapReduce |
|||
Apache Tika |
Software for identifying file formats. See https://tika.apache.org/![]() |
|||
ARChive format | ARC | ARC is a lossless data compression and archival format which was originally used by the Heritrix Web Crawler developed by the Internet Archive. See http://archive.org/web/researcher/ArcFileFormat.php![]() |
||
Automated Planning | A systematic and semi-automatic process that provides the ability to assess the impact of influencers and specify actionable preservation plans that define concrete courses of actions and the directives governing their execution. This is the operative management of obsolescence and maximizing expected value with minimal costs. | Derived from (Antunes et al., 2011) Maybe we should call it "Semi-automated Planning". |
||
Automated Watch | A systematic and semi-automatic process that provides the ability to monitor external and internal entities for changes having a potential impact on preservation and to provide notification. The Automated Watch Component denotes the software component that supports the Automated Watch process. |
Derived from (Antunes et al., 2011; CCSDS, 2002; Sierman & Wheatley, 2009) | ||
Azure Platform |
A cloud-based service, providing virtualized services such as Hadoop clusters |
|||
Bitstream | A bitstream is contiguous or non-contiguous data within a file that has meaningful common properties for preservation purposes. A bitstream cannot be transformed into a standalone file without the addition of file structure (headers, and so forth) and/or reformatting to comply with a particular file format. |
|||
CDX file format | Index file of ARC (see corresponding glossary entry) or WARC (see corresponding glossary entry) container files used by the Wayback machine (see corresponding glossary entry) to render archive web pages. See http://archive.org/web/researcher/cdx_file_format.php![]() |
|||
CDX index | See CDX file format. | |||
Characterisation Service | A characterisation service is a type of a digital preservation service that extracts any kind of information from a digital object, as an identifier or file related properties, for example. | |||
Cloud | Environments which provide resources and services to the user in a highly available and quality-assured fashion, thereby keeping the total cost for usage and administration minimal and adjusted to the actual level of consumption. | European Expert Group | ||
Cloud Computing | A pay-per-use model for enabling available, convenient, on-demand network access to a shared pool of configurable computing resources (e.g., networks, servers, storage, applications, services) that can be rapidly provisioned and released with minimal management effort or service-provider interaction. | NIST, USA |
||
(SCAPE) Components |
SCAPE components are Taverna Components, identified by the SCAPE Preservation Components sub-project, that conform to the general SCAPE requirements for having annotation of their behaviour, inputs and outputs. SCAPE components may be stored in the SCAPE Component Catalogue. |
|||
Component API | ||||
(SCAPE) Component Catalogue | The Component Catalogue is a searchable repository for the definitions of SCAPE Components, Component Families and Component Profiles. The component catalogue is implemented by the myExperiment![]() ![]() |
|||
Component Lookup API | [Part of the Component API] | |||
Component Management | Tools and the Component Catalogue Service encompassing the creation, storage and cross-organisational sharing of SCAPE Components. |
D2.3 | ||
Component Profile | A definition of an interface that a Component should conform to. A Component profile defines what input ports and output ports the Component must have, what inputs and outputs may be optionally present, and what semantic annotations may be attributed to the Component and its ports. | |||
Component Registration API | [Part of the Component API] A REST API to be implemented by Digital Object Repositories to allow SCAPE components to access the content and preservation plans held on the repository. | |||
Control Policies |
Policies that formulate the requirements for a specific collection, a specific preservation action, for a specific designated community This level can be human readable, but should also be machine readable and thus available for use in automated planning and watch tools to ensure that preservation actions and workflows chosen meet the specific requirements identified for that digital collection. |
|||
Data Connector API | ||||
Data locality | “Data locality” refers to the fact that Hadoop tries to assign map tasks to nodes that are close to the data, i.e. the processing cores are on the same machine as the hard disk storing the data blocks. | |||
Data Publication Platform | DPP | A platform supporting the publication of data sets, e.g. experimental SCAPE data, as Open Linked Data. |
||
Digital Object | A data structure principally comprised of digital material, metadata and a (persistent) identifier. | |||
Digital Object Model | DOM | A data exchange model, based on METS and PREMIS, to encapsulate digital objects and ensure a consistent and well-understood information exchange between SCAPE entities. |
D2.3 | |
Digital Object Repository | DOR | An OAIS Compliant repository that provides a data management solution for storing content and metadata about digital objects, as well as Preservation Plans. DORs implement three interfaces: Plan Management API; Data Connector API; and the Report API. | ||
DROID | Software developed by the National Archives (UK) to determine a unique file format identifier (PUID, see corresponding glossary entry). DROID is a software tool developed by The National Archives (UK) to perform identification of file formats. See http://digital-preservation.github.io/droid/![]() |
|||
Entity (architectural) | SCAPE architectural elements, e.g. the Execution Platform, the Digital Object Repository, etc. This term is used to avoid overloading the term “components”, distinguishing “architectural components” from “SCAPE Components”. |
D2.3 | ||
Entity (Scout) | A domain object that represents something of interest to Automated Watch, for example an entity may represent the JPEG2000 file format. Entities may have Properties. |
D2.3 | ||
Execution Environment | An abstract layer of the Execution Platform which provides a placeholder representing functionality to be fulfilled by a specific technology. The Execution Environment provides the physical infrastructure to perform computation. An example might be the nodes of a Hadoop cluster. |
D2.3 | ||
Execution Platform |
EP | An infrastructure that provides the computational resources to enact a Preservation workflow and execute Preservation actions. Abstracted into three layers: the Execution Environment; the Job Execution Service and the Job Submission Service API. |
D2.3 Technology agnostic. Could be Taverna Server, or Hadoop |
|
(SCAPE) Experiment Evaluation |
Findings and results, both measurable and non-measurable, of a particular execution of an Experiment, within the Testbed sub-package. |
|||
(SCAPE) Experiment | A unit of work that defines an implementation of a User Story, within the Testbed sub-package. It consists of a dataset, one or more preservation components, a workflow and a processing platform that can be used to evaluate SCAPE technology and provide evidence of scalable processing |
|||
External Assessment API | ||||
ffprobe | ffprobe | ffprobe is a tool that belongs to the ffmpeg family and is used to gather information about multi-media-files. http://ffmpeg.org/ffprobe.html![]() |
||
File | A file is a named and ordered sequence of bytes that is known by an operating system. A file can be zero or more bytes and has a file format, access permissions, and file system characteristics such as size and last modification date. |
D2.3 | ||
File Format Characterisation | The process of determining the properties of a file format, for example, the bit depth, colour space, width of an image, the frames per second of a video, etc. | |||
File Format Identification | The process of determining the identity of a file format instance, typically by assigning an identifier, as the PUID (see corresponding glossary entry) as a precise identifier or a MIME Type (see corresponding glossary entry) identifier as a vague file type identifier. | |||
Guidance Policies | Description of the general long term preservation goals of the organisation for its digital collection(s). |
|||
Hadoop | See Apache Hadoop. | |||
HDFS |
HDFS |
Hadoop Distributed File System. This is Hadoop’s file system which is designed to store files across machines in a large cluster. | ||
HBase |
HBase |
Distributed database on top of Hadoop/HDFS, see https://hbase.apache.org![]() |
||
Heritrix Web Crawler | Web crawler engine used to harvest content from the internet and store it in a web archive. The Heritrix Web Crawler was originally developed by the Internet Archive (see corresponding glossary entry). See https://webarchive.jira.com/wiki/display/Heritrix/Heritrix![]() |
|||
Intellectual Entity | IE | A set of content that is considered a single intellectual unit for purposes of management and description – for example, a particular book, map, photograph, or database. An intellectual entity may have one or more digital representations. |
D2.3 | |
Investigation Research Object | IRO | A Research Object centred around an investigation | ||
Internet Archive | The Internet Archive is a digital library which provides permanent storage of and free public access to collections of digitized materials, including websites, music, moving images, and nearly three million public-domain books. See https://archive.org![]() |
|||
Job Execution Service | JES | An abstract layer of the Execution Platform which provides a placeholder representing functionality to be fulfilled by a specific technology. The Job Execution Service provides job scheduling functionality, allocating computing tasks amongst the available hardware resources available within the Execution Environment. An example might be Taverna-Server or Hadoop. |
||
Job Submission Service | JSS | An abstract layer of the Execution Platform which provides a placeholder representing functionality to be fulfilled by a specific technology. Provides the entry point to the Execution Platform, implementing a remotely accessible interface to enable a user or client application to schedule and execute workflows (jobs) on the Execution Environment. The exact interface depends on the underlying Job Execution Service and Execution Platform, but typical examples would be the Hadoop API provided over a SSH connection, or the Taverna-Server REST API over HTTP. |
||
Linked Data Simple Storage Specification | LDS3 | A REST based specification to enable the controlled publishing of Linked-Data. A REST API to manage documents in a linked data system, enabling the simplistic publishing of complex datasets, providing a missing piece of the jigsaw of data publication. |
||
Loader Application | A component that loads Digital Objects into a Digital Object Repository that implements the SCAPE Data Connector API. | |||
Map/Reduce | MR | A programming paradigm for processing large data sets using a parallel, distributed algorithm on a Hadoop cluster. |
||
Microsoft Azure Platform |
See Azure Platform |
|||
MIME Type | A standard identifier used on the Internet to indicate the type of data that a file contains. | |||
MyExperiment | A web application to allow users to find, use and share scientific workflows and other Research Objects, and to build communities around them. |
|||
Netarchive Suite | Software suite built around the Heritrix Web Crawler originally developed by the The Royal Library and The State and University Library. See https://sbforge.org/display/NAS/NetarchiveSuite![]() |
|||
NFS |
Network File System |
|||
Notification API | ||||
Parallel Preservation Components | PPC | |||
Plan Management API |
An API to be implemented by Digital Object Repositories that provides HTTP endpoints for the retrieval and management of Preservation Plans. | |||
Plan Management GUI | A graphical user interface application that can be used to view and execute Preservation Plans on a Digital Object Repository that implements the Plan Management API. |
D4.1 | ||
Plan Management Service | PMS | A service that holds a Preservation Plan and manages its lifecycle. This can be any service that implements the Plan Management API is a Plan Management Service. Note that a PMS may also implement other APIs and be principally known by other names. | D4.1 | |
Planner | (in Watch WP domain): An agent with an interest in the change of the state of the world over time. This can be either the planning component or a user client. |
|||
Plato | A web-based tool that creates a Preservation Plan and provides a user interface for viewing, managing and updating that plan. The plan itself is stored in the Plan Management Service after creation. | |||
Preservation Plan | A preservation plan is a live document that defines a series of preservation actions to be taken by a responsible institution due to an identified risk for a set of digital objects or records (called a collection).It is defined by Plato and stored in a Plan Management Service. |
|||
Preservation Action Plan | PAP | A Preservation Action Plan is part of a Preservation Plan --- or a separate document for the purposes of processing — that describes a set of digital objects, an operation (typically a transformation) to apply to each of them, and a rule that allows the determination of whether the operation on a particular digital object was successful on the basis of characteristics measured on the instantiation of the digital object, what it was transformed into, or the comparison of what it was and what it became. A Preservation Action Plan does not describe how to instantiate the digital object, where to archive successful transformations, or where to report the outcome of applying the PAP. |
||
Preservation Component | PC | See SCAPE Component | ||
Preservation Plan | A preservation plan is a live document that defines a series of preservation actions to be taken by a responsible institution due to an identified risk for a set of digital objects or records (called a collection). It is defined by Plato and stored in a Plan Management Service. |
|||
Preservation Policies | Preservation policies should provide the mechanisms to document and communicate key aspects of relevance, in particular drivers and constraints and the goals and objectives motivated by them. They are to support the activities of an organisation with respect to the maintenance and preservation of a digital collection. Three levels of policy have been identified – guidance policies, procedure policies and control policies. |
D13.1 | ||
Preservation Procedure Policies | Policies that describe the approach an organisation will take in order to achieve the goals as stated on the higher level. They will be detailed enough to be input for processes and workflow design but can or will be at the same time concerned with the holdings in general. |
|||
Program for parallel Preservation Load |
PPL | An application that takes an existing Taverna Workflow as an input and automatically generates a Java class file that can be executed on a Hadoop cluster. | WP6 | |
PRONOM | PRONOM is an information system about data file formats and their supporting software products. See https://www.nationalarchives.gov.uk/PRONOM![]() |
|||
Pronom Unique Identifier | PUID | The PRONOM Persistent Unique Identifier (PUID) is an extensible scheme for providing persistent, unique and unambiguous identifiers for records in the PRONOM registry. Such identifiers are fundamental to the exchange and management of digital objects, by allowing human or automated user agents to unambiguously identify, and share that identification of, the representation information required to support access to an object. This is a virtue both of the inherent uniqueness of the identifier, and of its binding to a definitive description of the representation information in a registry such as PRONOM. From: http://www.nationalarchives.gov.uk/aboutapps/pronom/puid.htm![]() |
||
Pronom Signature File | Signature files are generated by PRONOM (see corresponding glossary entry) and used by DROID (see corresponding glossary entry) for file format identification. The signature file contains a subset of the information from the PRONOM knowledge base required by the DROID software to perform the file format identification. See https://www.nationalarchives.gov.uk/aboutapps/pronom/droid-signature-files.htm![]() |
|||
Properties (Scout) | Describes a certain “quality” of an Entity, for example Properties of the JPEG2000 file format might be a common name (with value: JPEG2000) or an indication of the level of tool support (with example value: limited). |
D2.3 | ||
Preservation Watch | See Automated Watch | |||
Representation | A Representation is the set of files, including structural metadata, needed for a complete and reasonable rendition of an intellectual entity. There can be more than one representation for the same intellectual entity. |
|||
Quality Level Descriptor | QLD | |||
Quality Assurance Component | A Quality Assurance Component is used to determine a quality measure related to the outcome of applying an Action Service (see corresponding glossary entry) to a digital object. | |||
Results Evaluation Framework |
REF | A generic semantic system for evaluting large datasets of experimentation results in a simple fashion | http://www.openplanetsfoundation.org/results-evaluation-framework-first-release![]() |
|
Report API | An OAI_PMH based API to be implemented by Digital Object Repositories that enables the SCAPE Automated Watch component to retrieve information about the state of the repository. |
|||
Rosetta Platform |
A digital preservation repository/system produced by Ex Libris |
|||
Scalable Preservation Environments |
SCAPE | An EU funded project developing scalable services for the planning and execution of institutional preservation strategies on an open source platform that orchestrates semi-automated workflows for large-scale, heterogeneous collections of complex digital objects. |
||
SCAPE Characterisation Component | Characterisation components are a family of SCAPE Components (defined to wrap tools produced in WP9) that compute one or more properties of a single instantiated digital object or file. The output ports that produce measures are always annotated with the metric (in the SCAPE Ontology) that describes what the component computes. |
|||
SCAPE Component | SCAPE components are Taverna Components, identified by the SCAPE Preservation Components sub-project, that conform to the general SCAPE requirements for having annotation of their behaviour, inputs and outputs. SCAPE components may be stored in the SCAPE Component Catalogue. |
|||
SCAPE Component Catalogue | A catalogue of Taverna components used when creating and enacting SCAPE workflows. The component catalogue is functionality implemented by myExperiment. | D7.3 | ||
SCAPE Migration Component | Migration components are a family of SCAPE Components (defined to wrap tools produced in WP10) that apply a transformation to an instantiated digital object or file to produce a new file. The input is annotated with a term (from the SCAPE Ontology) that says what sort of digital object/file is accepted, and the output is annotated with a term that says what sort of file is produced. |
|||
SCAPE Ontology | The SCAPE Ontology is an OWL ontology that formally defines the terms used by computing systems in SCAPE. |
|||
SCAPE Platform |
See Execution Platform | |||
SCAPE QA Component | QA components are a family of SCAPE Components (defined to wrap tools produced in WP11) that compute a comparison between two instantiated digital objects or two files. They produce at least one output that has a measure of similarity between the inputs, and that output is annotated with the metric (in the SCAPE Ontology) that describes the nature of the similarity metric. |
|||
SCAPE Story | A short and succinct high-level statement of the preservation issue encountered by a partner institution. |
|||
SCAPE Utility Component | Utility components are a family of Taverna Components that provide miscellaneous capabilities required for constructing SCAPE workflows, but which are not a core feature of the SCAPE preservation planning process. For example, they can provide assembly and manipulation of XML documents that contain collections of measures of workflows. Note that utility components are not SCAPE components per se; they do not conform to the standard profiles. Instead, they are used in support roles. |
|||
Scout | An Automated Watch system that provides an ontological knowledge base to centralize all necessary information to detect preservation risks and opportunities |
|||
Simulator | ||||
Source | A source represents specific aspects of the world that are of interest to digital preservation planners, providing measurements about properties associated with it. Sources can be internal or external, i.e. they can be part of the organization responsible for preservation or part of the outside world, such as Format Registries, SCAPE Component Catalogue or Human Knowledge. A Source (or Watch Source) represents certain aspects of the world for which there exists a known way of investigating certain properties that represent these aspects. Sources can be internal or external, i.e. they can be part of the organization responsible for preservation or part of the outside world. |
D2.3 D12.1 |
||
Source Push API | An API that allows for external Sources to push information directly to the Automated Watch Component. This API allows submission of Entities and Properties that describe the aspects of the world which the Source represents. | |||
Stager Application | An application that retrieves Digital Objects from a Digital Object Repository via the Data Connector API. | |||
Taverna Components | Taverna components are Taverna workflow fragments that are stored independently of the workflows that they are used in, and that are semantically annotated with information about what the behaviour of the workflow fragment is. They are logically related to a programming language shared library, though the mechanisms involved differ. Taverna components are stored in a component repository. This can either be a local directory, or a remote service that supports the Taverna Component API (e.g., the SCAPE Component Catalogue, implemented by myExperiment). Only components that are stored in a publicly accessible service can be used by a Taverna workflow that has been sent to a system that was not originally used to create it. |
D7.3 | ||
Taverna Component Profile | An XML document that describes the constraints that a Taverna component should adhere to, and the semantic annotations that may be used with that component. |
|||
Taverna Component Repository | A store of Taverna Components and Taverna component profiles. It is typically expected that the component repository would also be the component catalogue so that the components and their profiles can be found, and it is typically treated as a synonym of a catalogue; myExperiment is both a catalogue and a repository of Taverna components. | |||
Taverna Command Line Tool |
The Taverna Command Line Tool can execute a Taverna Workflow in a terminal/command prompt, without displaying a Graphical User Interface (GUI) |
|||
Taverna Server | TAVSERV | Taverna Server is a multi-user service that can execute Taverna workflows. Clients do not need to understand those workflows in order to execute them. |
||
Taverna Workbench | The Taverna Workbench is a desktop application for creating, editing and executing Taverna workflows. |
|||
Taverna Workflow | A Taverna workflow is a parallel data-processing program that can be executed by Taverna Workbench or Taverna Server. It is stored as an XML file, and has a graphical rendering. |
|||
Tool-to-MapReduce Wrapper | ToMaR | A SCAPE developed tool which wraps command line tasks for parallel execution as Hadoop MapReduce jobs |
||
Toolspec | An XML file written to a standard API that contains details of how to execute a tool for a particular purpose; for example txt2pdf might define how to use a command line tool to convert text to pdf. Toolspecs can have different types such as migration or QA. |
|||
Toolwrapper | The toolwrapper is a Java tool developed in the SCAPE Project to simplify the execution of the following tasks: Tool description (through the toolspec); Tool invocation (simplified) through command-line wrapping; Artifacts generation (associated to a tool invocation, e.g., Taverna workflow); and Packaging of all the generated artifacts for easier distribution and installation |
|||
Trigger | A Trigger is a unit that contains Conditions used during Assessment and Notifications that are sent when the Conditions are met. |
D12.1 | ||
(SCAPE) User Story |
See SCAPE Story |
|||
Watch Request | A Watch Request is a composition of one or more Questions, created by a Planner. |
D12.1 | ||
Watch Request API | An API that allows to manage (CRUD) Watch Requests. A Watch Request is a composition of one or more Watch Questions, created by a Planner. Watch Questions are predefined points of interest related directly or indirectly to Sources and Properties. The Questions can be parameterized in order to offer some flexibility to the Planner. A Property defines a specific part of Entities in the world that have the same Entity Type, i.e. it represents some characteristic of these Entities. It can specify a data type for its measurements. An Entity Type describes the type of an instance (i.e. an Entity). It groups instances of the same type and helps the Planner to pose meaningful Watch Requests. Some examples are: Format, Preservation Action, Experiment, etc. An Entity is a concrete instance of some Entity Type. E.g. ‘ImageMagick v1.0’ is a concrete instance (an Entity) and has the Entity Type Action component. |
|||
Wayback Machine | In the context of web archiving, the term Wayback Machine refers to a software used to render archived web pages, originally developed by the Internet Archive (http://archive.org![]() |
|||
Web ARChive file format | WARC | The WARC (Web ARChive) file format offers a convention for concatenating multiple resource records (data objects), each consisting of a set of simple text headers and an arbitrary data block into one long file. See http://bibnum.bnf.fr/WARC/![]() |
||
Web Archive Record | Information unit contained in an ARC (see corresponding glossary entry) or WARC (see corresponding glossary entry) container file. This information unit, can hold, for example, the record payload (bitstream of HTML file or image) together with the HTTP response metadata and some additional metadata related to the record (date, checksum, etc.). | |||
Web Crawler | Software used to capture and store web pages used by Web Archiving institutions to build their archives. | |||
Web Content Testbeds | WCT |
The Web Content Testbed is one of the Testbeds of the SCAPE project. The Testbeds are represented by memory institutions holding large data sets that are used to test the applicability of tools, workflows, and solutions developed in the SCAPE project. | ||
Web Snapshot | The image capture of a web page that is taken when a web page is rendered in a web browser. | |||
Workflow Repository | A service that stores workflows, allowing them to be distributed to other people in accordance with the defined access control policies. A workflow repository that holds Taverna workflows is consequently a Taverna workflow repository. MyExperiment is an example of a workflow repository. |
References:
- Antunes, G., Becker, C., Borbinha, J., Proença, D., & Vieira, R. (2011). Shaman reference architecture (version 3.0). SHAMAN project report. Retrieved from http://shaman-ip.eu
- CCSDS. (2002). Reference Model for an Open Archival Information System (OAIS). Retrieved from http://public.ccsds.org/publications/archive/650x0b1.pdf
- Sierman, B., & Wheatley, P. (2009). Report on the Planets Functional Model. Planets project deliverable (PP7-D3-4). Retrieved from http://www.planets-project.eu/docs/reports/Planets_PP7-D3-4_ReportOnThePlanetsFunctionalModel.pdf
Labels:
None