Key
This line was removed.
This word was removed. This word was added.
This line was added.

Changes (9)

View Page History

|| Metric || Description || Metric baseline || Metric goal || Evaluation 2014 April 8th\* || Evaluation 2014 June 17th-23rd*\* \\ ||
| [number of objects per second|http://www.purl.org/DP/quality/measures#418] | *Performance efficiency - Capacity / Time behaviour* \\
Number of objects that can be processed per second | 0.005 | 0.28 \\ | 0.0567 | 0.0619 |
| Number Of Objects Per Hour**\* \\ | *Performance efficiency - Capacity / Time behaviour* \\
Number of objects that can be processed per second | 18 (9th-13th November 2012) | 1000 | 204 \\ | 223 |
| [QAFalseDifferentPercent|http://ifs.tuwien.ac.at/dp/vocabulary/quality/measures#416] | *Functional suitability - Correctness* \\
Ratio of 'QA decided different'/'human judged same', \\
that is ratio of content comparisons resulting in original and migrated different, \\

|| max-split-size || duration \\ || launched map tasks \\
on the three Hadoop jobs  ||
| 1024 \\ | 37m, 58.593s = 2278.593s \\ | 3,3,7 \\ |
| 512 \\ | 24m, 1.9s = 1441.9s \\ | 6,6,14 \\ |
| 2014 Jun 18 | 1000 | 2000 (258GB) | 4h, 23m | 8h, 56m | 224 \\ | 111 | 174 | 8.7 \\ |
| 2014 Jun 19 | 999 \\ | 2999 (387GB) | 4h, 20m | 13h, 29m | 222 \\ | 52 | 226 | \~7.5 \\ |
| 2014 Jun 20 | 1000 | 3999 (516GB) \\ | 4h, 27m | 17h, 56m | 223 \\ | 142 | 368 | \~9.2 \\ |
| 2014 Jun 23 | 999 | 4998 (645GB) | 4h, 28m | 22h, 24m | 223 \\ | 67 | 435 | \~8.7 \\ |

The cluster set up that was used was the June 2014 version of the [SP:SB Hadoop Platform].

h3.

h3. *WebDAV* WebDAV

The Taverna logs and outputs of the June experiment are stored on [http://fue.onb.ac.at/scape-tb-evaluation/sb/LargeScaleAudioMigration/Mp3ToWavMigrationOnHadoop/] along with the SB scape Hadoop Cluster map-reduce client configuration.
* We have (stubbornly) kept the old measure _Number Of Objects Per Hour_ in our evaluation, as it is simply easier to read when the processing time is as long as in this experiment.
* QAFalseDifferentPercent was introduced as a measure, when we were working on smaller annotated datasets. When we are working on large scale real life datasets it is problematic. A better idea would probably be to have a _Dissimilar in Percent_ measure along with a _Correctness judgement_ based on the _Dissimilar in Percent_ measure along with prior correctness evaluations on annotated data. We would then also need a discussion of the adequacy of the solution when taken into acount the level of automation and the human resources still needed.

h2. Conclusion

The conclusion is that we are able to migrate our 20TB mp3 collection to wav including quality assurance in one month on the SB Hadoop Platform. We however need roughly .5 Petabyte available storage, which is not feasible, and we will not do this migration. The xcorrSound waveform-compare tool has proven robust and easy to integrate in a larger workflow, and we will continue maintenance and maybe further development on xcorrSound.