View Source

h2. Evaluator(s)

[Bolette Jurik|https://portal.ait.ac.at/sites/Scape/_layouts/userdisp.aspx?ID=59] (SB)

h2. Evaluation points

h5. Assessment of measurable points

|| Metric || Description || Metric baseline || Metric goal || 2014 April 8th\* ||
| NumberOfObjectsPerHour | *Performance efficiency - Capacity / Time behaviour* | 18 (9th-13th November 2012) | 1000 | 204 \\ |
| NumberOfFailedFiles | *Reliability - Runtime stability* | 0 | 0 | 0 |
| QAFalseDifferentPercent | *Functional suitability - Correctness* | 0.412 % (5th-9th November 2012) | 0.412 % | 82.76 %\\ |


\*Based on the small experiment with max split size 128 below. See explanation for the abysmal correctness score.


h6. Small Experiments

All run on a file list of *58 files*.

|| max split size || duration \\ || launched maps || success \\ || failure \\ ||
| 1024 \\ | 37m, 58.593s = 2278.593s \\ | 3,3,7 \\ | 18 \\ | 40 \\ |
| 512 \\ | 24m, 1.9s = 1441.9s \\ | 6,6,14 \\ | 0 \\ | 58 \\ |
| 256 \\ | 18m, 17.917 = 1097.917 \\ | 12,12,28 \\ | 0 \\ | 58 \\ |
| 128 \\ | 17m, 3.176 = 1023.176 \\ | 24,24,57 \\ | 10 \\ | 48 \\ |
| 64 \\ | 16m, 54.703s = 1014.703s \\ | 47,47,113 \\ | 0 \\ | 58 \\ |
| 32 \\ | 17m, 29.96s = 1049.96 \\ | 93,93,225 \\ | 4 \\ | 54 \\ |
The big question is why we get so many failures? The answer is of course that the list of pairs of files to compare is wrong\! This list is created by Taverna beanshells, and we are missing a sort of the two output lists from the FFmpeg and mpg321 Hadoop jobs, before we combine the lists to a list of pairs as input to the waveform-compare Hadoop job. This should be a fairly quick fix...

The exact number of MR maps seem not to have a big influence on performance, as long as we have more than 12. That is as long as max split size is at most 256. We note that we get approximately twice as many launched maps for the waveform-compare Hadoop job, simply because the input list is approximately twice as big, as it is a list of pairs. We can of course adjust this to get approximately the same number of jobs, but it does not seem to be important for the performance.


The first line of tests were to decide on expected optimal max split size.

The next line of tests will use 128 as max split size and vary on the size of the input

h5. Assessment of non-measurable points

_ReliableAndStableAssessment_ *{_}Reliability - Runtime stability{_}*

_For some evaluation points it makes most sense to a textual description/explanation_

Please include a note about goals-objectives omitted, and why.

h2. Technical details

_Remember to include relevant information, links, versions about workflow, tools, APIs (e.g. Taverna, command line, Hadoop, links to MyExperiment, link to tools or SCAPE name, links to distinct versions of specific components/tools in the component registry)_

h5. WebDAV

We would like to store sufficient information about an experiment (hadoop program, configuration, etc.), so we are able to rerun it. For this purpose, ONB is providing a WebDAV - if you have questions and need more information, please contact Sven or Reinhard at ONB.
Taverna workflows will still be stored on [myexperiment.org|http://www.myexperiment.org].

Link: [http://fue.onb.ac.at/scape-tb-evaluation|http://fue.onb.ac.at/scape-tb-evaluation]

Please use the following structure for storing experiment results

{code}
http://fue.onb.ac.at/scape-tb-evaluation/{institutionid}/{storyid}/{experimentid}/{timestamp}/

Example:
http://fue.onb.ac.at/scape-tb-evaluation/onb/arc2warc/jwat/1374526050/

where institutionid = onb, storyid = arc2warc, experimentid = jwat, timestamp = 1374526050
{code}


h2. Evaluation notes

_Could be such things as identified issues, workarounds, data preparation, if not already included above_

QAFalseDifferentPercent