Skip to end of metadata
Go to start of metadata
You are viewing an old version of this page. View the current version. Compare with Current  |   View Page History

Metrics Catalogue

We are currently in the process of merging the initial evaluations metrics catalogue into the attribute/measure catalogue being developed in the PW work package. When this has happened, all experiments should only use metrics from the latter mentioned catalogue.

Also, this page will describe (or refer to other pages) how to find and navigate the catalogue, so it will be easy to find the proper metrics and more time can be spent on the experiments.

This is the previously used evaluation metrics

To unify metrics across all evaluations all metrics should be registered in this Metrics Catalogue. So - when picking metrics for an evaluation run through the catalogue and pick any already defined or enter a new metric when needed.

Metric
PW catalogue
URI
Datatype
Description
Example
Comments
NumberOfObjectsPerHour   integer Number of objects that can be processed per hour
250
Could be used both for component evaluations on a single machine and on entire platform setups
IdentificationCorrectnessInPercent   integer
Defining a statistical measure for binary evaluations - see detailed specification below 85 %
Between 0 and 100
MaxObjectSizeHandledInGbytes
  integer
The max file size a workflow/component has handled
80
Specify in Gbytes
MinObjectSizeHandledInMbytes   integer The min file size a workflow/component has handled - illustrates capability of running on heterogeneous file sizes when combined with MaxObjectSizeHandledInGbytes 20
Specify in Mbytes
PlanEfficiencyInHours   integer
Number of hours it takes to build one preservation plan with Plato
20
Specify in hours
ThroughputGbytesPerMinute
  integer
The throughput of data measured in Gybtes per minute
5
Specify in Gbytes per minute
ThroughputGbytesPerHour   integer
The throughput of data measured in Gbytes per hour
25
Specify in Gbytes per minute
ReliableAndStableAssessment   boolean
Manual asessment on if the experiment performed reliable and stable
true
 
NumberOfFailedFiles   integer
Number of files that failed in the workflow
0
 
NumberOfFailedFilesAcceptable   boolean Manual asessment of whether the number of files that fail in the workflow is acceptable
true
 
QAFalseDifferentPercent   integer Number of content comparisons resulting in original and migrated different, even though human spot checking says original and migrated similar. 5%
Between 0 and 100
AverageRuntimePerItemInHours
  float The average processing time in hours per item
15
Positive floating point number

An attribute/measure catalogue is also developed in PW - this evaluation metrics catalogue will be merged with the PW catalogue in year-3.

If you want to have a quick glance at the PW catalogue its located here (google docs): https://docs.google.com/spreadsheet/ccc?key=0An_F2fZCFRRtdGZ6NFg0eFI3b3NIdktMSzBtWmhKUHc&pli=1#gid=0

Write to Christhop Becker at [email protected] to ask for access to the google doc

If you already are familiar with the PW catalogue you are off cause most welcome to use already existing metrics from in there - this will make the merging in year-3 much easier. But this is currently NOT a requirement.

Binary evaluation method (FMeasure)

We use sensitivity and specificity as statistical measures of the performance of the binary classification test where 
Sensitivity = Σ true different / (Σ true different + Σ false similar
and 
Specificity = Σ true similar / (Σ true similar + Σ false different
and the F-measure is calculated on this basis as shown in the table below:

 

This is one suggested way which is nicely applicable if we test for binary correctness of calculations, i.e. it is applicable for characterisation and QA

Labels:
None
Enter labels to add to this page:
Please wait 
Looking for a label? Just start typing.