We are currently in the process of merging the initial evaluations metrics catalogue into the attribute/measure catalogue being developed in the PW work package. When this has happened, all experiments should only use metrics from the latter mentioned catalogue.
Also, this page will describe (or refer to other pages) how to find and navigate the catalogue, so it will be easy to find the proper metrics and more time can be spent on the experiments.
To unify metrics across all evaluations all metrics should be registered in this Metrics Catalogue. So - when picking metrics for an evaluation run through the catalogue and pick any already defined or enter a new metric when needed.
|| PW catalogue
|NumberOfObjectsPerHour||integer|| Number of objects that can be processed per hour
|| Could be used both for component evaluations on a single machine and on entire platform setups
||Defining a statistical measure for binary evaluations - see detailed specification below|| 85 %
|| Between 0 and 100
|| The max file size a workflow/component has handled
|| Specify in Gbytes
|MinObjectSizeHandledInMbytes||integer||The min file size a workflow/component has handled - illustrates capability of running on heterogeneous file sizes when combined with MaxObjectSizeHandledInGbytes|| 20
||Specify in Mbytes|
|| Number of hours it takes to build one preservation plan with Plato
|| Specify in hours
|| The throughput of data measured in Gybtes per minute
|| Specify in Gbytes per minute
|| The throughput of data measured in Gbytes per hour
|| Specify in Gbytes per minute
|| Manual asessment on if the experiment performed reliable and stable
|| Number of files that failed in the workflow
|NumberOfFailedFilesAcceptable||boolean|| Manual asessment of whether the number of files that fail in the workflow is acceptable
|QAFalseDifferentPercent||integer||Number of content comparisons resulting in original and migrated different, even though human spot checking says original and migrated similar.|| 5%
||Between 0 and 100|
||float|| The average processing time in hours per item
|| Positive floating point number
An attribute/measure catalogue is also developed in PW - this evaluation metrics catalogue will be merged with the PW catalogue in year-3.
If you want to have a quick glance at the PW catalogue its located here (google docs): https://docs.google.com/spreadsheet/ccc?key=0An_F2fZCFRRtdGZ6NFg0eFI3b3NIdktMSzBtWmhKUHc&pli=1#gid=0
Write to Christhop Becker at [email protected] to ask for access to the google doc
If you already are familiar with the PW catalogue you are off cause most welcome to use already existing metrics from in there - this will make the merging in year-3 much easier. But this is currently NOT a requirement.
We use sensitivity and specificity as statistical measures of the performance of the binary classification test where
Sensitivity = Σ true different / (Σ true different + Σ false similar)
Specificity = Σ true similar / (Σ true similar + Σ false different)
and the F-measure is calculated on this basis as shown in the table below:
This is one suggested way which is nicely applicable if we test for binary correctness of calculations, i.e. it is applicable for characterisation and QA