Skip to end of metadata
Go to start of metadata

Collections:

Title
Nexus data files from instruments
Description These are data files captured straight from instruments. They contain measurements collected from instrument detectors. There is no typical size or number of a detector that an instrument has. For example, for STFC ISIS facility, the number of detector  ranges from several thousands to a quarter of a million. The typical format of these files are raw or NeXus. The later is an international standard for neutron and synchrotron communities. The former is facility specific: many historic data files are in this format. Increasingly, NeXus format is being adopted as the standard format for instrument data.
Licensing See the STFC Data Policy for the SCAPE project
Owner STFC
Dataset Location https://scapeweb.esc.rl.ac.uk/

(please get in touch with STFC for accessing the data)

Collection expert Erica Yang (STFC)
Issues brainstorm These are individual data files produced by the experiments. These files are readings of invididual experimental runs. They, themselves, do not have enough information to allow anybody to process them because, basically, they are neutron counts in the STFC ISIS facility case. They are raw data because it contains errors and noises that are needed to be removed before it can be analysed. Therefore, first of all, they have to be preserved alongside with the contextual information describing where it was produced (e.g. which instrument), when it was produced (which ISIS cycle), and what experiment it was produced for. All these information allow establishing the linkages between these raw files and relevant files generated at the same time while the files are being produced during an experiment. 

Other types of contextual information needed to be preserved include the software needed to process the files, the samples that are used to produce the files.


List of Issues
Title
Scientific datasets relevant to STFC facilities
Description There are three categories of scientific datasets:

Licensing See the STFC Data Policy for the SCAPE project
Owner Research investigators, then, when the data becomes public, it will be owned by the public.
Dataset Location Raw and catalogue data availability: https://scapeweb.esc.rl.ac.uk/
Processed data is not owned by STFC, so, not available
Collection expert Erica Yang (STFC)
List of Issues For 1st and 2nd SCAPE years:
Title
ICAT Catalogue data
Description Come in two types of formats: XMLs and database records
Licensing See the STFC Data Policy for the SCAPE project
Owner STFC
Dataset Location https://scapeweb.esc.rl.ac.uk/

(for access, get in touch with STFC)

Collection expert Erica Yang (STFC)
Issues brainstorm these data is key to meaningful indexing and searching of data files within the data archive.
List of Issues Not applicable to 1st and 2nd SCAPE years


Title
Processed scientific datasets  and related context (workflow, software, provenance)
Description These datasets are produced during data reduction and analysis phases of a scientific investigation. There is no generalised standard format for this type of data, although there are domain specific ones. For example, in Chemistry, CML is a de-facto standard for describing molecular information.
Licensing Not available as yet.
Owner Scientists, not STFC
Dataset Location Not available.
Collection expert Erica Yang (STFC)
Issues
Relevant to preservation because of the rising interest of archiving processed data and provenance, but irrelevant to SCAPE as such collections are currently not held by STFC.

Issues:

Issue 1

Title
Examine the long term value of the preserved datasets
Detailed description A large collection of raw data files are being collected into STFC archive every year capturing the experimental data captured straight from a large number of scientific instruments. We are trialling a basic bit-level preservation system on the newly created files. There is limited understanding of the preserved value of these collections. For example, how useful are they (e.g. are they containing enough information for researchers other than the original investigators to interpret them?) We need an efficient approach to measure and examine the value of these collections so that the preservation cost can be justified and the benefits can be quantified.
Scalability Challenge
Every year, some facilities generate millions of raw data files per instrument and there are often 10s of instruments per facilities. In addition, the file sizes vary significantly from instrument to instrument. Some generate files in the order of GBs, some in the order of KBs (but with a large number of small files). So, any approach for solving this problem has to be scalable (e.g. in terms of file size and volume) as well as fully automated.
Issue champion Simon Lambert (STFC)
Other interested parties
Any other parties who are also interested in applying Issue Solutions to their Datasets. Identify the party with a link to their contact page on the SCAPE Sharepoint site, as well as identifying their institution in brackets. Eg: Schlarb Sven (ONB)
Possible Solution approaches Brief brainstorm of possible approaches to solving the Issue. Each approach should be described in a single sentence as part of a bulleted list. Note that actual Solutions will be owned by the Solution Provider who should be a different person from the Issue Champion. Reaching a satisfactory conclusion for the Issue should be considered a team effort between these parties.
Datasets Nexus data files, ICAT catalogue data
Solutions Reference to the appropriate Solution page(s), by hyperlink
Evaluation Objectives Not applicable for the 1st and 2nd SCAPE years
Actual evaluations links to acutual evaluations of this Issue/Scenario

Issue 2

Title
Preserving the verifiability and provenance of processed datasets
Detailed description Preserving the relationships between components of research objects is the challenge to tackle because it not only involves the components themselves but also the intrinsic relationship between them. We need to preserve the components but also the relationships between the components and allow the continuous evolution of such relationships to incorporate new components over time.
Scalability Challenge
Same as IS36
Issue champion Simon Lambert (STFC)
Other interested parties
Any other parties who are also interested in applying Issue Solutions to their Datasets. Identify the party with a link to their contact page on the SCAPE Sharepoint site, as well as identifying their institution in brackets. Eg: Schlarb Sven (ONB)
Possible Solution approaches Use of Preservation Network Models to record “deep” dependencies and to allow for tracking over time.
Datasets Nexus data files, ICAT data catalogue, workflow, software, and processed data
Solutions Reference to the appropriate Solution page(s), by hyperlink
Evaluation Objectives Not applicable for the 1st and 2nd SCAPE years
Actual evaluations links to acutual evaluations of this Issue/Scenario

Solutions:

Title SO24 Use Preservation Network Model to record "deep" dependencies and to allow tracking over time
Detailed description Preservation Network Model (PNM) is a methodology to record "deep" dependencies between digital artifacts and to allow tracking over time (this is a solution yet to develop)
Solution Champion
Simon Lambert (STFC)
Corresponding Issue(s)
Evaluation
Not applicable to 1st and 2nd SCAPE years
Labels:
scenario scenario Delete
researchdata researchdata Delete
rdscenarios rdscenarios Delete
Enter labels to add to this page:
Please wait 
Looking for a label? Just start typing.
  1. Oct 11, 2012

    Its my impression that each scenario should have only one issue. A different issue is a different scenario. Am I wrong?

    Regarding issue 1, I'm not sure if thats a preservation challenge. I see it as an appraisal and selection problem. Preservation comes after selection... right?