Skip to end of metadata
Go to start of metadata

Evaluation specs component level

Field
Datatype
Value
Description
Evaluation seq. num.
int
1
Use only of sub-sequent evaluations of the same evaluation is done in another setup than a previous one.
In that case copy the Evaluation specs table and fill out a new one with a new sequence number.
For the first evaluation leave this field at "1"
Evaluator-ID email baj@statsbiblioteket.dk Unique ID of the evaluator that carried out this specific evaluator.
Evaluation describtion text The overall goals is first to evaluate the stability of Taverna as the in-production workflow engine. Secondly we want to test FFprobe as characterisation, file format validation and property validation tool (in combination with schematron) for 'Large Video Files'. Thirdly we would like to test the Isilon Storage performance, when the scratch storage is moved from NAS/SAN to Isilon Storage.
Textual description of the evaluation and the overall goals
Evaluation-Date DD/MM/YY 29/07/13 Date of evaluation
Dataset(s) string
Danish TV broadcasts, mpeg-2 transport stream
Danish TV broadcast, H.264/MPEG-4 AVC
Danish Radio broadcast, MPEG1-Layer 2
Link to dataset page(s) on WIKI
For each dataset that is a part of an evaluation
make sure that the dataset is described here: Datasets
Workflow method string
Taverna Taverna / Commandline / Direct hadoop etc...
Workflow(s) involved
URL(s)
https://github.com/statsbiblioteket/youseeingestworkflow Link(s) to MyExperiment if applicable
Tool(s) involved
URL(s) Ffprobe (a large number of tools is involved, but the focus in this evaluation is Ffprobe)
Link(s) to distinct versions of specific components/tools in the component registry if applicable
Link(s) to Scenario(s) URL(s)
Characterisation and validation of audio and video files during ingest Link(s) to scenario(s) if applicable

Technical setup

Field
Datatype
Value
Description
Description String SB Video File Ingest Platform Human readable description of the "platform" - e.g. Bjarnes Linux PC
Total number of physical CPUs integer 8 Number of CPU's involved
CPU specs string Intel® Xeon® Processor X5355 (8M Cache, 2.66 GHz, 1333 MHz FSB) Specification of CPUs
Total number of CPU-cores integer 32 Number of CPU-cores involved
Total amount of RAM in Gbytes
integer 32 Total amount of RAM on all nodes
Operating System
String Linux (CentOS release 6.3 (Final)) Linux (specific distribution), Windows (specific distribution), other?
Storage system/layer String NAS/SAN + NFS NFS, HDFS, local files, ?

Evaluation points

metrics must come from / be registered in the metrics catalogue

Metric Baseline definition Baseline value (24/07/13) Goal Evaluation 1 (date)
Evaluation 2 (date)
Evaluation 3 (date)
ReliableAndStableAssessment Reliability - Runtime stability  (focus: Taverna workflow)
Manual assessment. The workflow is set up with a workflow monitor in which each step of the workflow is recorded. If a file does not complete the workflow, it is added to the downloadlist again and is run trhough the workflow again. It is also marked as 'failed' and can be viewed on the workflow monitor GUI, and a mail is sent to the digital content manager. If it later completes it is simply marked 'completed'. If the workflow cannot complete due to an inconsistency (e.g. wrong file format), the content provider is contacted.
true true
     
NumberOfFailedFilesAcceptable Reliability - Runtime stability (focus: Ffprobe component)
Manual assessment. Ffprobe returns errors on broken files. This is acceptable as these files should not continue through the workflow, but rather be re-downloaded. Ffprobe may also miss audiotracks that are not present from the beginning of the video file, but added at a later time stamp. This means that the characterisation information from Ffprobe is not complete. We however also have characterisation information from a different tool, so we consider this acceptable.
true
true
     
MaxObjectSizeHandledInGbytes Performance efficiency - Capacity (focus: Taverna workflow - all components) 3.91Gb
5      
MinObjectSizeHandledInMbytes Performance efficiency - Capacity (focus: Taverna workflow - all components) 113.79Mb
100
     
...
           
NumberOfObjectsPerHour Performance efficiency - Capacity / Time behaviour (focus: Taverna workflow)
The ingest workflows runs for approximately 8 hours a day and ingest between 800Gb and 1TB of radio and tv broadcasts.
76 100
     
Labels:
None
Enter labels to add to this page:
Please wait 
Looking for a label? Just start typing.