Skip to end of metadata
Go to start of metadata

Evaluation specs component level

Evaluation seq. num.
Evaluation of LSDRT Scenaro 3 - TIFF to JP2 migration and validation of resultant JP2 and image comparison.
Evaluator-ID email [email protected]

Evaluation describtion text The migration of TIFF files to JP2, followed by validation of the new JP2 files using Jpylyzer and Matchbox.

The evaluation is to test the processing speed, reliability and correctness of such a migration and the tools used.

Evaluation-Date DD/MM/YY 06/11/2012


Platform BL-0  
Dataset(s) string
30 master TIFF files from JISC1 19th Century Digitised Newspapers (465MB total)
Workflow method string
Hadoop calling command line tools and Java code, one workflow per file.

 The code consists of two parts - a Java wrapper for Hadoop and a "workflow" style Java class that is executed once per map/file.  A text file containing locations of input files is given as input to the wrapper.

 The wrapper code performs the following, once per input file/map:
  * Copies file to local temporary storage for processing (from HDFS)
  * Calls the "workflow" class
  * Stores outputs from the workflow class in HDFS
  * Queries the workflow class for success/failure of workflow and reports this in the final overall output from the wrapper (a CSV file: original name, success boolean, output filename)

The "workflow" class performs the following:
  * Checksums the input file (Java code)
  * Extracts metadata from the input file (Exiftool)
  * Migrates the input file (OpenJPEG)
  * Extracts metadata from the output file (Exiftool)
  * Extracts jpylyzer info from the output file (Jpylyzer)
  * Checks the jpylyzer output against the Jpeg 2000 profile used to encode the file (Java code)
  * Extract features from input file (Matchbox)
  * Extract features from output file (Matchbox)
  * Compare SIFT data (Matchbox)
  * Compare Profile data (Matchbox)
  * Generates a short report containing Jpylyzer's isValidJP2, whether the Jpeg 2000 profiles match and whether the Matchbox SIFT comparison resulted in a value >0.9 (Java code)
  * Checksums all files (Java code)
  * Zips all files with a BagIt style structure (Java code)
  * Output includes a log of all commands lines run, with stdout/stderr from each tool

Workflow(s) involved
Tool(s) involved
URL(s) Debian "testing" fairly up to date at time of test

OpenJPEG - nb. that the 1.3 version in the Debian "testing" repositories does not work with TIFF input files. You need to build the 1.5.1 binaries from source.
Hadoop 1.0.4 (Apache compiled .deb)
Jpylyzer 1.6.3 (from github, compiled using pyinstaller 2.0)
Exiftool (from Debian testing)OpenJDK 6 (from Debian testing)
OpenCV 2.4.2 (compiled from source)
Matchbox (from github, compiled from source)

Link(s) to Scenario(s) URL(s)
LSDRT3 Validating Migrated Images 'Visually'

Platform BL 0

Platform-ID String   Platform BL 0
Platform description String   This is a pseudo-distributed single-node Hadoop instance running on a virtual machine on our work laptops and is used for our development. Initial evaluation will be performed on this platform with the long term goal being to run against both experimental DPT platform and using the BL cluster.
Number of nodes integer   1
Total number of physical CPUs integer   1
CPU specs string   1 Intel Core i5-2540M CPU @ 2.6GHz
Total number of CPU-cores integer   1
Total amount of RAM in Gbytes
integer   2GB
average CPU-cores for nodes
integer   1
avarage RAM in Gbytes for nodes
integer   2GB
Operating System on nodes
String   Debian "testing", fairly current as of test date
Storage system/layer String   HDFS on virtual disk.
Network layer between nodes String   n/a


Evaluation points

metrics must come from / be registered in the metrics catalogue

Metric Baseline definition Baseline value Goal Evaluation 1 (06-11-2012)
Evaluation 2 (date)
Evaluation 3 (date)
NumberOfObjectsPerHour Processing speed with shell script 50
ThroughputGbytesPerHour Processing speed with shell script 0.766
ReliableAndStableAssessment Reliability and correctness
The migration completed successfully, success/failure of each of individual migration workflow was noted in the overall output.  One file did not migrate to JP2 successfully and this outcome was identified in the output from the workflow and in the overall report.  The same issues about OpenJPEG/BL profile were present in the output files as in LSDR2-1.  The failed migration did not affect the rest of the migration, which completed successfully.

true true
true N/A
NumberOfFailedFiles Reliability
One file failed during the migration.  However, this did not stop the rest of the migration from completing and the failure was clearly identified in the outputs.


** The goal values assume that we want to complete the migration of the JISC Newspapers collection (2.2 million images) over two months (60 days) and that the sample data we have used here are representative of the collection as a whole. These values are subject to change.

Enter labels to add to this page:
Please wait 
Looking for a label? Just start typing.