Evaluation specs component level
Field |
Datatype |
Value |
Description |
|
---|---|---|---|---|
Evaluation seq. num. |
int |
1 |
Evaluation of LSDRT Scenaro 3 - TIFF to JP2 migration and validation of resultant JP2 and image comparison. | |
Evaluator-ID | [email protected] |
|
||
Evaluation describtion | text | The migration of TIFF files to JP2, followed by validation of the new JP2 files using Jpylyzer and Matchbox. The evaluation is to test the processing speed, reliability and correctness of such a migration and the tools used. |
|
|
Evaluation-Date | DD/MM/YY | 06/11/2012 |
|
|
Platform-ID |
string |
Platform BL-0 | ||
Dataset(s) | string |
30 master TIFF files from JISC1 19th Century Digitised Newspapers (465MB total) | |
|
Workflow method | string |
Hadoop calling command line tools and Java code, one workflow per file. The code consists of two parts - a Java wrapper for Hadoop and a "workflow" style Java class that is executed once per map/file. A text file containing locations of input files is given as input to the wrapper. The wrapper code performs the following, once per input file/map: * Copies file to local temporary storage for processing (from HDFS) * Calls the "workflow" class * Stores outputs from the workflow class in HDFS * Queries the workflow class for success/failure of workflow and reports this in the final overall output from the wrapper (a CSV file: original name, success boolean, output filename) The "workflow" class performs the following: * Checksums the input file (Java code) * Extracts metadata from the input file (Exiftool) * Migrates the input file (OpenJPEG) * Extracts metadata from the output file (Exiftool) * Extracts jpylyzer info from the output file (Jpylyzer) * Checks the jpylyzer output against the Jpeg 2000 profile used to encode the file (Java code) * Extract features from input file (Matchbox) * Extract features from output file (Matchbox) * Compare SIFT data (Matchbox) * Compare Profile data (Matchbox) * Generates a short report containing Jpylyzer's isValidJP2, whether the Jpeg 2000 profiles match and whether the Matchbox SIFT comparison resulted in a value >0.9 (Java code) * Checksums all files (Java code) * Zips all files with a BagIt style structure (Java code) * Output includes a log of all commands lines run, with stdout/stderr from each tool |
|
|
Workflow(s) involved |
URL(s) |
NA | |
|
Tool(s) involved |
URL(s) | Debian "testing" fairly up to date at time of test OpenJPEG - nb. that the 1.3 version in the Debian "testing" repositories does not work with TIFF input files. You need to build the 1.5.1 binaries from source. Hadoop 1.0.4 (Apache compiled .deb) Jpylyzer 1.6.3 (from github, compiled using pyinstaller 2.0) Exiftool (from Debian testing)OpenJDK 6 (from Debian testing) OpenCV 2.4.2 (compiled from source) Matchbox (from github, compiled from source) |
|
|
Link(s) to Scenario(s) | URL(s) |
LSDRT3 Validating Migrated Images 'Visually' | |
Platform BL 0
Field |
Datatype |
Value |
Description |
---|---|---|---|
Platform-ID | String | Platform BL 0 | |
Platform description | String | This is a pseudo-distributed single-node Hadoop instance running on a virtual machine on our work laptops and is used for our development. Initial evaluation will be performed on this platform with the long term goal being to run against both experimental DPT platform and using the BL cluster. |
|
Number of nodes | integer | 1 |
|
Total number of physical CPUs | integer | 1 |
|
CPU specs | string | 1 Intel Core i5-2540M CPU @ 2.6GHz |
|
Total number of CPU-cores | integer | 1 |
|
Total amount of RAM in Gbytes |
integer | 2GB |
|
average CPU-cores for nodes |
integer | 1 |
|
avarage RAM in Gbytes for nodes |
integer | 2GB |
|
Operating System on nodes |
String | Debian "testing", fairly current as of test date |
|
Storage system/layer | String | HDFS on virtual disk. |
|
Network layer between nodes | String | n/a | |
|
|
Evaluation points
metrics must come from / be registered in the metrics catalogue
Metric | Baseline definition | Baseline value | Goal | Evaluation 1 (06-11-2012) |
Evaluation 2 (date) |
Evaluation 3 (date) |
---|---|---|---|---|---|---|
NumberOfObjectsPerHour | Processing speed with shell script | 50 |
1600** |
45 |
||
ThroughputGbytesPerHour | Processing speed with shell script | 0.766 |
25** |
0.697 | ||
ReliableAndStableAssessment | Reliability and correctness The migration completed successfully, success/failure of each of individual migration workflow was noted in the overall output. One file did not migrate to JP2 successfully and this outcome was identified in the output from the workflow and in the overall report. The same issues about OpenJPEG/BL profile were present in the output files as in LSDR2-1. The failed migration did not affect the rest of the migration, which completed successfully. |
|
true | true |
||
OrganisationalFit | |
true | N/A |
|||
NumberOfFailedFiles | Reliability One file failed during the migration. However, this did not stop the rest of the migration from completing and the failure was clearly identified in the outputs. |
|
0 |
1* |
** The goal values assume that we want to complete the migration of the JISC Newspapers collection (2.2 million images) over two months (60 days) and that the sample data we have used here are representative of the collection as a whole. These values are subject to change.
Labels:
None