Skip to end of metadata
Go to start of metadata

Evaluation specs platform/system level

Field
Datatype
Value
Description
Evaluation seq. num.
int
1
Use only of sub-sequent evaluations of the same evaluation is done in another setup than a previous one.
In that case copy the Evaluation specs table and fill out a new one with a new sequence number.
For the first evaluation leave this field at "1"
Evaluator-ID email [email protected] Unique ID of the evaluator that carried out this specific evaluator.
Evaluation describtion text Since November 2011 we have been running FITS on a selection of our web content spread over the years from 2005 up till 2011.

The data is stored in ARC files on a SAN. These ARC files are fetched from this SAN, unpacked and the FITS are run on each ARC record.

Running FITS on a ARC record produces an XML file. These XML files from a single ARC are packed into TGZ files and made available to the Planning and Watch subproject.

To evaluate this job we extract information on the timing of the FITS jobs together with information from the ARC files.
Textual description of the evaluation and the overall goals
Evaluation-Date DD/MM/YY 25th of November 2011 till 8th of November 2012 Date of evaluation
Platform-ID string
SB Test Platform Unique ID of the platform involved in the particular evaluation - see Platform page included below
Dataset(s) string
http://wiki.opf-labs.org/display/SP/State+and+University+Library+Denmark+-+Web+Archive+Data Link to dataset page(s) on WIKI
For each dataset that is a part of an evaluation
make sure that the dataset is described here: Datasets
 
Workflow method string
Commandline Taverna / Commandline / Direct hadoop etc…
 
Workflow(s) involved
URL(s)
None Link(s) to MyExperiment if applicable
Tool(s) involved
URL(s) fits 0.6.0, arc-unpacker 0.2 Link(s) to distinct versions of specific components/tools in the component registry if applicable
Link(s) to Scenario(s) URL(s)
 WCT3 Link(s) to scenario(s) if applicable


SB Test Platform

Field
Datatype
Value
Description
Platform-ID String Platform SB 1 Unique string that identifies this specific platform.
Use the platform name
Platform description String We have five Blade servers located at SB Human readable description of the platform. Where is it located, contact info, etc.
Number of nodes integer 5 physical servers Number of hosts involved - could be both physical hosts as well as virtual hosts
Total number of physical CPUs integer 10 Number of CPU's involved
CPU specs string Intel® Xeon® Processor X5670  
(12M Cache, 2.93 GHz, 6.40 GT/s Intel® QPI)
Specification of CPUs
Total number of CPU-cores integer 60 Number of CPU-cores involved
Total amount of RAM in Gbytes
integer 288 GB Total amount of RAM on all nodes
average CPU-cores for nodes
integer 6 Number of CPU-cores in average across all nodes
avarage RAM in Gbytes for nodes
integer 96 GB Amount of memory in average across all nodes
Operating System on nodes
String Red Hat based Linux Linux (specific distribution), Windows (specific distribution), other?
Storage system/layer String SAN and EMC Isilon NFS & HDFS
Network layer between nodes String 2 GB ethernet Speed of network interfaces, general network speed
       

   

Evaluation points

metrics must come from / be registered in the metrics catalogue

The motivation behind the goal is as follows: we want to be able to run a FITS-like characterisation on a complete snap-shot of the Danish TLD within weeks. Such a snap-shot harvest amounts to 25 TB. This gives a throughput in the order of 1GB/minute. "FITS-like" is here defined as a characterisation using multiple tools combined with a comparison of the output of these tools.

Even though the base line is calculated based on one thread on one CPU, we did the actual assessment on a five machine cluster where each process was allowed to use up to 4 threads. This experiment is our first evaluation.

Metric Baseline definition Baseline value Goal Evaluation 1 (8/11 2012)
Evaluation 2 (date)
Evaluation 3 (date)
ThroughputGbytesPerHour Measurement of the running time of the FITS jobs assuming one thread on one machine. During the last year the job has actual run on one to five servers using one to four threads but that job distribution is not represented in the metadata. 0.162 60 1.32
   
OrganisationalFit
  N/A
true
true
   
             
Labels:
None
Enter labels to add to this page:
Please wait 
Looking for a label? Just start typing.