Skip to end of metadata
Go to start of metadata
Tool Name
xcorrsound
pagelyzer
matchbox
jplyzer
Code for Hadoop on github
https://github.com/statsbiblioteket/scape-audio-qa-experiments     https://github.com/KBNLresearch/hadoop-jp2-experiment
Description of test dataset
Approx 1TB of a 20TB dataset of 2 hour mp3 radio broadcast files (average file size: 118Mb)
+ derived dataset 1.1TB of migrated/converted wav files (in total 700 files)
    Metamorfoze sample batch, 8047 pages / TIFF images
Link to test Dataset
Danish Radio broadcasts, mp3
    http://wiki.opf-labs.org/display/SP/KB+Metamorfoze+Migration+%28sample+batch%29
Number of nodes 4 physical servers, see SB Hadoop Platform
    4 (1 master, 3 worker nodes)
Total number of CPU-cores 48     1
CPU specs Intel® Xeon® Processor X5670   
(12M Cache, 2.93 GHz, 6.40 GT/s Intel® QPI)
    2.66 Ghz Quad-Core
Total amount of RAM in Gbytes 384 GB     16
NumberOfObjectsPerHour 508     17244
AverageRuntimePerItemInHours       5.8 x 10 -5
Taverna Hadoop Wokflow (if available)
http://www.myexperiment.org/workflows/4080.html      

Efforts for Taverna Hadoop workflow:

http://wiki.opf-labs.org/display/SP/QA+Taverna+Hadoop

Jpylyzer calculation

(raw data from http://wiki.opf-labs.org/pages/viewpage.action?pageId=36012209)

  • 8047 images
  • total time (jpylyzer only) 0h28m = 0.47 hrs

So:

  • NumberOfObjectsPerHour = 8047 / 0.47 = 17244
  • AverageRuntimePerItemInHours = 0.47 / 8047 = 5.8 x 10 -5
Labels:
None
Enter labels to add to this page:
Please wait 
Looking for a label? Just start typing.