Skip to end of metadata
Go to start of metadata

Evaluation specs component level

Field
Datatype
Value
Description
Evaluation seq. num.
int
1
For the first evaluation leave this field at "1"
Evaluator-ID email [email protected] Unique ID of the evaluator that carried out this specific evaluator.
Evaluation describtion text Raw to Nexus format migration - STFC ISIS facility
The typical format of these files are RAW or NeXus. NeXus is an international standard for neutron and synchrotron communities. RAW is facility specific: many historic data files are in this format. Increasingly, NeXus format is being adopted as the standard format for instrument data.
Textual description of the evaluation and the overall goals

Performance

Evaluation-Date DD/MM/YY 14/11/2012 Date of evaluation
Dataset(s) string
OPF STFC scientific datasets
Link to dataset page(s) on WIKI  http://wiki.opf-labs.org/display/SP/STFC+Scientific+Datasets 
 
Workflow method string
- commandline
- Java
Taverna / Commandline / Direct hadoop etc...
 
Workflow(s) involved
URL(s)
n/a Link(s) to MyExperiment if applicable
Tool(s) involved
URL(s) -NeXus Data Format Windows Distribution Kits 
-raw2nexus as part of the Mantid software framework (http://www.mantidproject.org/Main_Pag)
Link(s) to distinct versions of specific components/tools in the component registry if applicable
Link(s) to Scenario(s) URL(s)
General Scientific Data Handling Scenarios http://wiki.opf-labs.org/display/SP/RDST2+Format+Migration+of+%28raw%29+Scientific+Datasets

Technical setup

Field
Datatype
Value
Description
Description String Windows Human readable description of the "platform" - e.g. Bjarnes Linux PC
Total number of physical CPUs integer 1 Number of CPU's involved
CPU specs string 2nd generation Intel® Core™ i5-2557M processor with Intel® Turbo Boost Technology 2.0 Specification of CPUs
Total number of CPU-cores integer 1 Number of CPU-cores involved
Total amount of RAM in Gbytes
integer 6 Total amount of RAM on all nodes
Operating System
String Windows 7 Professional 64 Linux (specific distribution), Windows (specific distribution), other?
Storage system/layer String local file system NFS, HDFS, local files, ?
       

   

Evaluation points

metrics must come from / be registered in the metrics catalogue

Metric Baseline definition Baseline value
Goal Evaluation 1 (date)
14/11/2012
Evaluation 2 (date)
Evaluation 3 (date)
ThroughputGbytesPerMinute The evaluation is completed on a single machine.
 
A nexus file is created from a RAW file which contains the data, and a collection of log (text) files containing sample environment data.

As of Nov 2012, ISIS has roughly 16.5Tb of RAW and log files. With the evaluated value of 1.73Gb/min, it would take about 7 days to process 16.5Tb of data. Our projected goal is to achieve it in a day, 

n/a 12Gb 1.73Gb    
NumberOfObjectsPerHour

With an evaluated throughput of 1.73Gb/min, we expected the NumberOfObjectsPerHour to be much higher. This could be due to the varied size of  files.

 Log files are typically very small, some of them can be as small as 1kb but the RAW files can be well over 10Gb. On average 6 log files are needed with one RAW file to create on nexus file. The number of log files required differs from instrument to instrument.

As of Nov 2012, ISIS has roughly 11,000,000 files, of which 9,500,000 log(txt) files and 1,500,000 RAW files. With the evaluated value of 1152/hr, it would take about 13 months to process the whole set of files. Taking into consideration that we have a lot of very small files, and using Hadoop for parallelisation, we have projected a conservative goal of achieving it in a month.


n/a 15,300 1152    
             
Labels:
None
Enter labels to add to this page:
Please wait 
Looking for a label? Just start typing.