Tool Name |
xcorrsound |
pagelyzer |
matchbox |
jplyzer |
---|---|---|---|---|
Code for Hadoop on github |
https://github.com/statsbiblioteket/scape-audio-qa-experiments![]() |
https://github.com/KBNLresearch/hadoop-jp2-experiment![]() |
||
Description of test dataset |
Approx 1TB of a 20TB dataset of 2 hour mp3 radio broadcast files (average file size: 118Mb) + derived dataset 1.1TB of migrated/converted wav files (in total 700 files) |
Metamorfoze sample batch, 8047 pages / TIFF images | ||
Link to test Dataset |
Danish Radio broadcasts, mp3 |
http://wiki.opf-labs.org/display/SP/KB+Metamorfoze+Migration+%28sample+batch%29![]() |
||
Number of nodes | 4 physical servers, see SB Hadoop Platform |
4 (1 master, 3 worker nodes) | ||
Total number of CPU-cores | 48 | 1 | ||
CPU specs | Intel® Xeon® Processor X5670 (12M Cache, 2.93 GHz, 6.40 GT/s Intel® QPI) |
2.66 Ghz Quad-Core | ||
Total amount of RAM in Gbytes | 384 GB | 16 | ||
NumberOfObjectsPerHour | 508 | 17244 | ||
AverageRuntimePerItemInHours | 5.8 x 10 -5 | |||
Taverna Hadoop Wokflow (if available) |
http://www.myexperiment.org/workflows/4080.html![]() |
Efforts for Taverna Hadoop workflow:
http://wiki.opf-labs.org/display/SP/QA+Taverna+Hadoop
Jpylyzer calculation
(raw data from http://wiki.opf-labs.org/pages/viewpage.action?pageId=36012209)
- 8047 images
- total time (jpylyzer only) 0h28m = 0.47 hrs
So:
- NumberOfObjectsPerHour = 8047 / 0.47 = 17244
- AverageRuntimePerItemInHours = 0.47 / 8047 = 5.8 x 10 -5
Labels:
None