View Source

h2. Metrics Catalogue

To unify metrics across all evaluations all metrics should be registered in this Metrics Catalogue. So - when picking metrics for an evaluation run through the catalogue and pick any already defined or enter a new metric when needed.

{code}Use CamelCase notation for metric names - e.g. NumberOfObjectsPerHour{code}


|| Metric \\ || Datatype \\ || Description \\ || Example \\ || Comments \\ ||
| NumberOfObjectsPerHour | integer | Number of objects that can be processed per hour \\ | 250 \\ | Could be used both for component evaluations on a single machine and on entire platform setups \\ |
| IdentificationCorrectnessInPercent | integer \\ | Defining a statistical measure for binary evaluations - [see detailed specification below|#Metricscatalogue-fmeasure] | 85 % \\ | Between 0 and 100 \\ |
| MaxObjectSizeHandledInGbytes \\ | integer \\ | The max file size a workflow/component has handled \\ | 80 \\ | Specify in Gbytes \\ |
| PlanEfficiencyInHours | integer \\ | Number of hours it takes to build one preservation plan with Plato \\ | 20 \\ | Specify in hours \\ |
| | | | | |
| | | | | |

An attribute/measure catalogue is also developed in PW - this evaluation metrics catalogue will be merged with the PW catalogue in year-3.


If you want to have a quick glance at the PW catalogue its located here (google docs): [https://docs.google.com/spreadsheet/ccc?key=0An_F2fZCFRRtdGZ6NFg0eFI3b3NIdktMSzBtWmhKUHc&pli=1#gid=0]

Write to Christhop Becker at [[email protected]|mailto:[email protected]] to ask for access to the google doc

If you already are familiar with the PW catalogue you are off cause most welcome to use already existing metrics from in there - this will make the merging in year-3 much easier. But this is currently NOT a requirement.

{anchor:fmeasure}

h2. Binary evaluation method (FMeasure)

We use _sensitivity_ and _specificity_ as statistical measures of the performance of the binary classification test where 
_Sensitivity_ = Σ {color:#99cc00}true different{color} / (Σ{color:#99cc00} true different{color} \+ Σ {color:#ff0000}false similar{color}) 
and 
_Specificity_ = Σ{color:#99cc00} true similar{color} / (Σ {color:#99cc00}true similar{color} \+ Σ{color:#ff0000} false different{color}) 
and the F-measure is calculated on this basis as shown in the table below:

  !BinaryEvaluation.png|border=1,width=551,height=201!



This is one suggested way which is nicely applicable if we test for binary correctness of calculations, i.e. it is applicable for characterisation and QA