Alastair Duncan

Evaluation points

Metric Description Metric baseline Metric goal 20/06/2014 Maps per node 8 Split 1 20/06/2014 Maps per node 4 Split 4 20/06/2014 Maps per node 4 Split 50
NumberOfObjectsPerHour Number of objects processed in one hour 479.3

998.32 720 238.55
MaxObjectSizeHandledInGbytes Max size of raw files 0.16689453   0.16689453 0.16689453 0.16689453
MinObjectSizeHandledInGbytes Min size of raw files 2.24113e-5   2.24113e-5 2.24113e-5 2.24113e-5
ThroughputGbytesPerMinute The throughput of data measured in Gigabytes per minute 0.246   0.513 0.370 0.123
ThroughputGbytesPerHour The throughput of data measured in Gigabyte per hour 14.764   30.768 22.190 7.352
ReliableAndStableAssessment Manual assessment on if the experiment performed reliable and stable true   true true true
NumberOfFailedFiles Number of files that failed in the workflow 1
  1 1 1
AverageRuntimePerItemInSeconds The average processing time in seconds per item 7.51
  3.60 5.0 15.09
throughput in bytes per second The throughput of data measured in bytes per second 4403436.169
number of  objects per second
Number of objects that can be processed per second 0.13   0.28
max object size handled in bytes Max size of raw files 24064
  24064 24064 24064
min object size handled in bytes Min size of raw files 160312320   160312320 160312320 160312320

Evaluation notes

Timings for moving the data onto hdfs and generating the input files for ToMaR are not included in any of the evaluations. Baseline results are for the small dataset with the non Hadoop workflow executed using Taverna on a single node from the Hadoop cluster. Timings for baseline included Taverna overheads, no stage timings are available from CLI version of Taverna 2.4. Taverna overheads were not included in Hadoop experiments.

One failure due to missing metadata file for one of the test files.

