IS31 Semantic checking of very large data files
Detailed description In order to ensure that the data to be preserved is of adequate quality , the contents of NeXus data files would need to be validated for their correctness against a given data model. Each data model is specified in a NeXus Definition Language (NXDL) file and contains assertions that define the expected content of a NeXus file. For example, a data model could define a metadata element (key-value pair) called “Integral” to represent the total integral monitor counts for grazing incidence small angle diffractometer GISAS for either x-ray or neutrons. In this scenario, the data type of the metadata element “Integral” would be an integer. For a NeXus data file conforming to this data model, it would be necessary to validate the value(s) assigned to “Integral” to ensure it is of appropriate data type.
Issue champion Simon Lambert (STFC)
Possible Solution approaches Use of the NeXus validation toolkit - developed and used by the NeXus community - as part of the preservation ingest workflow.
Datasets nexus data files
Evaluation Objectives scaleability, automation
* handle up to 100Gb files without crashing
* fail safe - even if it fails, it fails gracefully, rather than crashing, i.e. effective error handling should be in place to allow the environment that hosts the tool to capture the error and notify other services that may interact with the tool.
* can check very large data files concurrently with semantic checking of small files (up to 5 concurrent threads).
* can perform semantic checking with a data rate up to 100MB/s
