To unify metrics across all evaluations all metrics should be registered in this Metrics Catalogue. So - when picking metrics for an evaluation run through the catalogue and pick any already defined or enter a new metric when needed.

Metric |
Datatype |
Description |
Example |
Comments |
---|---|---|---|---|

NumberOfObjectsPerHour | integer | Number of objects that can be processed per hour |
250 |
Could be used both for component evaluations on a single machine and on entire platform setups |

IdentificationCorrectnessInPercent | integer |
Defining a statistical measure for binary evaluations - see detailed specification below | 85% |
Between 0 and 100 |

MaxObjectSizeHandledInGbytes |
integer |
The max file size a workflow/component has handled |
80 |
Specify in Gbytes |

PlanEfficiencyInHours | integer |
Number of hours it takes to build one preservation plan with Plato |
20 |
Specify in hours |

## Binary evaluation method (FMeasure)

We use *sensitivity* and *specificity* as statistical measures of the performance of the binary classification test where

*Sensitivity* = Σ true different / (Σ true different + Σ false similar)

and

*Specificity* = Σ true similar / (Σ true similar + Σ false different)

and the F-measure is calculated on this basis as shown in the table below:

This is one suggested way which is nicely applicable if we test for binary correctness of calculations, i.e. it is applicable for characterisation and QA

