Title |
Examine the long term value of the preserved datasets |
Detailed description |
A large collection of raw data files are being collected into STFC archive every year capturing the experimental data captured straight from a large number of scientific instruments. We are trialling a basic bit-level preservation system on the newly created files. There is limited understanding of the preserved value of these collections. For example, how useful are they (e.g. are they containing enough information for researchers other than the original investigators to interpret them?) We need an efficient approach to measure and examine the value of these collections so that the preservation cost can be justified and the benefits can be quantified. |
Scalability Challenge |
Every year, some facilities generate millions of raw data files per instrument and there are often 10s of instruments per facilities. In addition, the file sizes vary significantly from instrument to instrument. Some generate files in the order of GBs, some in the order of KBs (but with a large number of small files). So, any approach for solving this problem has to be scalable (e.g. in terms of file size and volume) as well as fully automated. |
Issue champion |
Simon Lambert (STFC) |
Other interested parties |
Any other parties who are also interested in applying Issue Solutions to their Datasets. Identify the party with a link to their contact page on the SCAPE Sharepoint site, as well as identifying their institution in brackets. Eg: Schlarb Sven (ONB) |
Possible Solution approaches |
Brief brainstorm of possible approaches to solving the Issue. Each approach should be described in a single sentence as part of a bulleted list. Note that actual Solutions will be owned by the Solution Provider who should be a different person from the Issue Champion. Reaching a satisfactory conclusion for the Issue should be considered a team effort between these parties. |
Datasets |
Nexus data files, ICAT catalogue data |
Solutions |
Reference to the appropriate Solution page(s), by hyperlink |