Platform

Purpose of this experiment


_Figure 1 (above): Taverna workflow_

Diagram of the TIFF to JPEG2000 image migration workflow, Workflow available on MyExperiment at []
The Taverna workflow reads a textfile containing absolute paths to TIF image files and converts them to JP2 image files using OpenJPEG ([|]).
The following diagram shows the average execution time of each component of the workflow in seconds and was created from a 1000 images sample of the Austrian National Library Tresor Music Collection:


_Figure 2 (above): execution times of each of the workflows’ steps_

In the design phase this analysis is used to examine the average execution times for the individual tools. As a consequence of this experiment we might conclude, that over 4 seconds for the the FITS-based TIF image validation takes too much time and that this processing step needs to be improved, while the Jpylyzer validation is acceptable taking only slightly more than 1 second per image file in average.

The following diagram shows the comparison of wall clock times in seconds (y-axis) of the Taverna workflow and the Pig workflow using an increasing number of files (x-axis).

_Figure 3 (above): Wallclock times of concept workflow and scalable workflow_

However, the throughput we can reach using [this |SP:ONB Hadoop Platform]cluster and the chosen pig/hadoop job configuration is limited; as figure 4 shows, the throughput (measured in Gigabytes per hour -- GB/h) is rapidly growing when the number of files being processed is increased, and then stabilises at a value around slightly more than 90 Gigabytes per hour (GB/h) when processing more than 750 image files. !throughput_gb_per_h.png|border=1,width=654,height=363!
_Figure 4 (above): Throughput of the distributed execution measured in Gigabytes per hour (GB/h) against the number of files processed_