Examine the long term value of the preserved datasets
Detailed description A large collection of raw data files are being collected into STFC archive every year capturing the experimental data captured straight from a large number of scientific instruments. We are trialling a basic bit-level preservation system on the newly created files. There is limited understanding of the preserved value of these collections. For example, how useful are they (e.g. are they containing enough information for researchers other than the original investigators to interpret them?) We need an efficient approach to measure and examine the value of these collections so that the preservation cost can be justified and the benefits can be quantified.
Scalability Challenge
Every year, some facilities generate millions of raw data files per instrument and there are often 10s of instruments per facilities. In addition, the file sizes vary significantly from instrument to instrument. Some generate files in the order of GBs, some in the order of KBs (but with a large number of small files). So, any approach for solving this problem has to be scalable (e.g. in terms of file size and volume) as well as fully automated.
Issue champion Simon Lambert (STFC)
Datasets Nexus data files, ICAT catalogue data
