Skip to end of metadata
Go to start of metadata


William Palmer, British Library


BL 19th Century Newspapers: BL 19th Century Digitized Newspapers


BL Hadoop Platform: BL Hadoop Platform


The workflow has been implemented as a Taverna workflow, in Java code and as a batch file.

The latest Taverna workflow is here:

Latest Java code, workflow and batch files are here:

The latest (Taverna/Java) workflow contains the following steps:

  • *Recover TIFF file from storage (HDFS/Fedora/Webdav)
  • Run Exiftool to extract metadata from TIFF
  • Migrate TIFF->JP2 (using OpenJPEG/Kakadu)
  • Run Exiftool to extract metadata from JP2
  • Run Jpylyzer over the JP2
  • *Run Schematron validator over Jpylyzer outputs to validate conformance of migrated image to the specified profile
  • Use ImageMagick to compare TIFF and JP2
  • *Create report
  • Create output package (JP2, results, etc)
  • Post files back to relevant storage (see above)

Note that steps marked * are not performed in the batch workflow.

Requirements and Policies

NumberOfObjectsPerHour >= 1600 (This assumes we want to process the entire collection within 2 months).
ThroughputGbytesPerHour >= 25 (This assumes we want to process the entire collection within 2 months).
OrganisationalFit = "Can this workflow/solution/components be applied and used at the BL? Are the components using supported technology? etc."
NumberOfFailedFiles = 0 (We can probably lose speed, but we cannot without question lose files)


Upcoming evaluations compare several 1TB migrations, with different storage backends and JP2 codecs.

Enter labels to add to this page:
Please wait 
Looking for a label? Just start typing.