h2. Metrics Catalogue

To unify metrics across all evaluations all metrics should be registered in this Metrics Catalogue. So - when picking metrics for an evaluation run through the catalogue and pick any already defined or enter a new metric when needed.

{code}Use CamelCase notation for metric names - e.g. NumberOfObjectsPerHour{code}

|| Metric \\ || Datatype \\ || Description \\ || Example \\ || Comments \\ ||

| NumberOfObjectsPerHour | integer | Number of objects that can be processed per hour \\ | 250 \\ | Could be used both for component evaluations on a single machine and on entire platform setups \\ |

| IdentificationCorrectnessInPercent | integer \\ | Defining a statistical measure for binary evaluations - [see detailed specification below|#Metricscatalogue-fmeasure] | 85 % \\ | Between 0 and 100 \\ |

| MaxObjectSizeHandledInGbytes \\ | integer \\ | The max file size a workflow/component has handled \\ | 80 \\ | Specify in Gbytes \\ |

| PlanEfficiencyInHours | integer \\ | Number of hours it takes to build one preservation plan with Plato \\ | 20 \\ | Specify in hours \\ |

| | | | | |

| | | | | |

{anchor:fmeasure}

h2. Binary evaluation method (FMeasure)

We use _sensitivity_ and _specificity_ as statistical measures of the performance of the binary classification test where

_Sensitivity_ = Σ {color:#99cc00}true different{color} / (Σ{color:#99cc00} true different{color} \+ Σ {color:#ff0000}false similar{color})

and

_Specificity_ = Σ{color:#99cc00} true similar{color} / (Σ {color:#99cc00}true similar{color} \+ Σ{color:#ff0000} false different{color})

and the F-measure is calculated on this basis as shown in the table below:

!BinaryEvaluation.png|border=1,width=551,height=201!

This is one suggested way which is nicely applicable if we test for binary correctness of calculations, i.e. it is applicable for characterisation and QA

To unify metrics across all evaluations all metrics should be registered in this Metrics Catalogue. So - when picking metrics for an evaluation run through the catalogue and pick any already defined or enter a new metric when needed.

{code}Use CamelCase notation for metric names - e.g. NumberOfObjectsPerHour{code}

|| Metric \\ || Datatype \\ || Description \\ || Example \\ || Comments \\ ||

| NumberOfObjectsPerHour | integer | Number of objects that can be processed per hour \\ | 250 \\ | Could be used both for component evaluations on a single machine and on entire platform setups \\ |

| IdentificationCorrectnessInPercent | integer \\ | Defining a statistical measure for binary evaluations - [see detailed specification below|#Metricscatalogue-fmeasure] | 85 % \\ | Between 0 and 100 \\ |

| MaxObjectSizeHandledInGbytes \\ | integer \\ | The max file size a workflow/component has handled \\ | 80 \\ | Specify in Gbytes \\ |

| PlanEfficiencyInHours | integer \\ | Number of hours it takes to build one preservation plan with Plato \\ | 20 \\ | Specify in hours \\ |

| | | | | |

| | | | | |

{anchor:fmeasure}

h2. Binary evaluation method (FMeasure)

We use _sensitivity_ and _specificity_ as statistical measures of the performance of the binary classification test where

_Sensitivity_ = Σ {color:#99cc00}true different{color} / (Σ{color:#99cc00} true different{color} \+ Σ {color:#ff0000}false similar{color})

and

_Specificity_ = Σ{color:#99cc00} true similar{color} / (Σ {color:#99cc00}true similar{color} \+ Σ{color:#ff0000} false different{color})

and the F-measure is calculated on this basis as shown in the table below:

!BinaryEvaluation.png|border=1,width=551,height=201!

This is one suggested way which is nicely applicable if we test for binary correctness of calculations, i.e. it is applicable for characterisation and QA