Skip to end of metadata
Go to start of metadata
You are viewing an old version of this page. View the current version. Compare with Current  |   View Page History

Metrics catalogue

We are currently in the process of merging the initial evaluations metrics catalogue into the attribute/measure catalogue being developed in the PW work package. When this has happened, all experiments should only use metrics from the latter mentioned catalogue.
Also, preferable this page will describe how to navigate the catalogue, so it will be easy to find the proper metrics and more time can be spent on the experiments.

Picking metrics

When picking metrics for an evaluation, run through the catalogue and pick any already defined, or enter a new metric when needed.

The attribute/measure catalogue developed in PW can be found here Measures
Also, an equivalent attribute/measure source can be found in this google doc Measures by google doc
Write to Kresimir Duretec for access to the google doc.

This is the previously used evaluation metrics

PW catalogue
NumberOfObjectsPerHour   integer Number of objects that can be processed per hour
Could be used both for component evaluations on a single machine and on entire platform setups
IdentificationCorrectnessInPercent   integer
Defining a statistical measure for binary evaluations - see detailed specification below 85 %
Between 0 and 100
The max file size a workflow/component has handled
Specify in Gbytes
MinObjectSizeHandledInMbytes   integer The min file size a workflow/component has handled - illustrates capability of running on heterogeneous file sizes when combined with MaxObjectSizeHandledInGbytes 20
Specify in Mbytes
PlanEfficiencyInHours   integer
Number of hours it takes to build one preservation plan with Plato
Specify in hours
The throughput of data measured in Gybtes per minute
Specify in Gbytes per minute
ThroughputGbytesPerHour   integer
The throughput of data measured in Gbytes per hour
Specify in Gbytes per minute
ReliableAndStableAssessment   boolean
Manual assessment on if the experiment performed reliable and stable
NumberOfFailedFiles   integer
Number of files that failed in the workflow
NumberOfFailedFilesAcceptable   boolean Manual assessment of whether the number of files that fail in the workflow is acceptable
QAFalseDifferentPercent   integer Number of content comparisons resulting in original and migrated different, even though human spot checking says original and migrated similar. 5%
Between 0 and 100
  float The average processing time in hours per item
Positive floating point number

Binary evaluation method (FMeasure)

We use sensitivity and specificity as statistical measures of the performance of the binary classification test where 
Sensitivity = Σ true different / (Σ true different + Σ false similar
Specificity = Σ true similar / (Σ true similar + Σ false different
and the F-measure is calculated on this basis as shown in the table below:


This is one suggested way, which is nicely applicable, if we test for binary correctness of calculations, i.e. it is applicable for characterisation and QA.

Enter labels to add to this page:
Please wait 
Looking for a label? Just start typing.