View Source

|| Tool Name \\ || xcorrsound \\ || pagelyzer \\ || matchbox \\ || jplyzer \\ ||
| *Code for Hadoop on github* \\ | [https://github.com/statsbiblioteket/scape-audio-qa-experiments] | | | [https://github.com/KBNLresearch/hadoop-jp2-experiment] |
| *Description of test dataset* \\ | Approx 1TB of a 20TB dataset of 2 hour mp3 radio broadcast files (average file size: 118Mb) \\
+ derived dataset 1.1TB of migrated/converted wav files (in total 700 files) \\ | | | Metamorfoze sample batch, 8047 pages / TIFF images |
| *Link to test Dataset* \\ | [SP:Danish Radio broadcasts, mp3]\\ | | | [http://wiki.opf-labs.org/display/SP/KB+Metamorfoze+Migration+%28sample+batch%29] |
| *Number of nodes* | 4 physical servers, see [SP:SB Hadoop Platform]\\ | | | 4 (1 master, 3 worker nodes) |
| *Total number of CPU-cores* | 48 | | | 1 |
| *CPU specs* | Intel® Xeon® Processor X5670    \\
(12M Cache, 2.93 GHz, 6.40 GT/s Intel® QPI) | | | 2.66 Ghz Quad-Core |
| *Total amount of RAM in Gbytes* | 384 GB | | | 16 |
| *NumberOfObjectsPerHour* | 508 | | | 17244 |
| *AverageRuntimePerItemInHours* | | | | 5.8 x 10 ^\-5^ |
| *Taverna Hadoop Wokflow (if available)* \\ | [http://www.myexperiment.org/workflows/4080.html|http://www.myexperiment.org/workflows/4080.html] | | | |
Efforts for Taverna Hadoop workflow:

[http://wiki.opf-labs.org/display/SP/QA+Taverna+Hadoop]

h2. Jpylyzer calculation

(raw data from [http://wiki.opf-labs.org/pages/viewpage.action?pageId=36012209])

* 8047 images
* total time (jpylyzer only) 0h28m = 0.47 hrs

So:

* NumberOfObjectsPerHour = 8047 / 0.47 = 17244
* AverageRuntimePerItemInHours = 0.47 / 8047 = 5.8 x 10 ^\-5^