Skip to end of metadata
Go to start of metadata

Evaluation specs platform/system level

Field
Datatype
Value
Description
Evaluation seq. num.
int
1
Evaluation of LSDRT Scenaro 2 - TIFF to JP2 migration and validation of resultant JP2.
Evaluator-ID email william.palmer@bl.uk
Evaluation description text The migration of TIFF files to JP2, followed by validation of the new JP2 files using Jpylyzer.

The evaluation is to test the processing speed, reliability and correctness of such a migration and the tools used.

 
Evaluation-Date DD/MM/YYYY 06/11/2012
Platform-ID string
Platform BL-0
Dataset(s) string
30 master TIFF files from JISC1 19th Century Digitised Newspapers (465MB total)
 
Workflow method string
Hadoop calling command line tools and Java code, one workflow per file.

 The code consists of two parts - a Java wrapper for Hadoop and a "workflow" style Java class that is executed once per map/file.  A text file containing locations of input files is given as input to the wrapper.

 The wrapper code performs the following, once per input file/map:
  * Copies file to local temporary storage for processing (from HDFS)
  * Calls the "workflow" class
  * Stores outputs from the workflow class in HDFS
  * Queries the workflow class for success/failure of workflow and reports this in the final overall output from the wrapper (a CSV file: original name, success boolean, output filename)

The "workflow" class performs the following:
  * Checksums the input file (Java code)
  * Extracts metadata from the input file (Exiftool)
  * Migrates the input file (OpenJPEG)
  * Extracts metadata from the output file (Exiftool)
  * Extracts jpylyzer info from the output file (Jpylyzer)
  * Checks the jpylyzer output against the Jpeg 2000 profile used to encode the file (Java code)
  * Generates a short report containing Jpylyzer's isValidJP2 and whether the Jpeg 2000 profiles match (Java code)
  * Checksums all files (Java code)
  * Zips all files with a BagIt style structure (Java code)
  * Output includes a log of all commands lines run, with stdout/stderr from each tool

 
Workflow(s) involved
URL(s)
 
Tool(s) involved
URL(s) Debian "testing" fairly up to date at time of test

OpenJPEG - nb. that the 1.3 version in the Debian "testing" repositories does not work with TIFF input files. You need to build the 1.5.1 binaries from source.
Hadoop 1.0.4 (Apache compiled .deb)
Jpylyzer 1.6.3 (from github, compiled using pyinstaller 2.0)
Exiftool (from Debian testing)OpenJDK 6 (from Debian testing)
 
Link(s) to Scenario(s) URL(s)
LSDRT2+Validating+files+migrated+from+TIFF+to+JPEG2000


Platform BL 0

Field
Datatype
Value
Description
Platform-ID String   Platform BL 0
Platform description String   This is a pseudo-distributed single-node Hadoop instance running on a virtual machine on our work laptops and is used for our development. Initial evaluation will be performed on this platform with the long term goal being to run against both experimental DPT platform and using the BL cluster.
Number of nodes integer   1
Total number of physical CPUs integer   1
CPU specs string   1 Intel Core i5-2540M CPU @ 2.6GHz
Total number of CPU-cores integer   1
Total amount of RAM in Gbytes
integer   2GB
average CPU-cores for nodes
integer   1
avarage RAM in Gbytes for nodes
integer   2GB
Operating System on nodes
String   Debian "testing", fairly current as of test date
Storage system/layer String   HDFS on virtual disk.
Network layer between nodes String   n/a
       

   

Evaluation points

metrics must come from / be registered in the metrics catalogue

Metric Baseline definition Baseline value
Goal Evaluation 1 (06-11-2012)
Evaluation 2 (date)
Evaluation 3 (date)
NumberOfObjectsPerHour Processing speed  with shell script
50 1600** 87.4
     
     
ThroughputGbytesPerHour Processing speed  with shell script
0.766 25** 1.355    
ReliableAndStableAssessment Reliability and correctness
The workflow completed successfully and no failures were encountered at runtime.  However, there is an incompatibility with OpenJPEG and the BL j2k profile: when coder bypass is enabled the outputs of the files show compression artefacts.  Also, one converted file failed to open and was corrupt, despite Jpylyzer assessing its headers as valid.  This shows that Jpylyzer validation should not be used alone for checking the success or otherwise of the migration.

true false
   
OrganisationalFit     true
   
NumberOfFailedFiles
Reliability
No files failed during the workflow.  However, when visually reviewing files, one file was found that would not open in various programs, despite Jpylyzer assessing its headers as valid.

0
0*
   

Previous tests were run on the same platform, but with different data, to compare the relative times taken for the following methods of executing a single command line migration from TIFF to JP2 using OpenJPEG:

  1. Batch file
  2. Hadoop - Java class calling migration command line
  3. Hadoop - Java class executing the migration command line in a Taverna workflow via Taverna command line tool
  4. Hadoop - Java class executing the migration command line in a Taverna workflow via Taverna Server instance in Tomcat

When looking at average runtime per file this gave an indication of the average overhead per file for each method:

  1. N/A (baseline)
  2. 0.69s
  3. 10.17s
  4. 25.84s

** The goal values assume that we want to complete the migration of the JISC Newspapers collection (2.2 million images) over two months (60 days) and that the sample data we have used here are representative of the collection as a whole. These values are subject to change.

Labels:
None
Enter labels to add to this page:
Please wait 
Looking for a label? Just start typing.