Raw to Nexus format migration - STFC ISIS facility
The typical format of these files are RAW or NeXus. NeXus is an international standard for neutron and synchrotron communities. RAW is facility specific: many historic data files are in this format. Increasingly, NeXus format is being adopted as the standard format for instrument data.
14/11/2012
OPF STFC scientific datasets
commandline
Java
NeXus Data Format Windows Distribution Kits 
raw2nexus as part of the Mantid software framework
General Scientific Data Handling Scenarios

Windows
1
2nd generation Intel® Core™ i5-2557M processor with Intel® Turbo Boost Technology 2.0
1
6
Windows 7 Professional 64
Goal Evaluation 1 (date)
Evaluation 2 (date)
Evaluation 3 (date)
ThroughputGbytesPerMinute The evaluation is completed on a single machine.
A nexus file is created from a RAW file which contains the data, and a collection of log (text) files containing sample environment data.

As of Nov 2012, ISIS has roughly 16.5Tb of RAW and log files. With the evaluated value of 1.73Gb/min, it would take about 7 days to process 16.5Tb of data. Our projected goal is to achieve it in a day, 

n/a 12Gb 1.73Gb    

With an evaluated throughput of 1.73Gb/min, we expected the NumberOfObjectsPerHour to be much higher. This could be due to the varied size of  files.

 Log files are typically very small, some of them can be as small as 1kb but the RAW files can be well over 10Gb. On average 6 log files are needed with one RAW file to create on nexus file. The number of log files required differs from instrument to instrument.

As of Nov 2012, ISIS has roughly 11,000,000 files, of which 9,500,000 log(txt) files and 1,500,000 RAW files. With the evaluated value of 1152/hr, it would take about 13 months to process the whole set of files. Taking into consideration that we have a lot of very small files, and using Hadoop for parallelisation, we have projected a conservative goal of achieving it in a month.

n/a 15,300 1152    
