One line summary | Given is a list of TIFF images, some of them are compressed as "Group 4 Fax" TIFF images. The compression causes issues in some application contexts, therefore it might be required to remove the compression from a large TIFF images input file set. In order to achieve this, a workflow has been created using the Taverna Workflow Design and Execution workbench ![]() ![]() ![]() |
Detailed description | The diagram below provides a global overview on the components that are used in the workflow. ![]() Notes in order to understand the diagram The green boxes are operating nodes that apply a characterisation or file format conversion to a file. The purple boxes are input parameters on the top of the diagram and output results on the bottom of it. The violette boxes, like "readFile, Read_Text_File, Flatten_List, Get_Image_From_URL" are local services that are available in Taverna by default, so they can be generally applied in a wide variety of data analysis and conversion contexts. The dark blue boxes are so called splitters that create kind of input slots where parameters can be connected to out of XML descriptions of these parameters that are available in a Web Service Description (WSDL), for example. Finally, the brown boxes (URL2List, Beanshell) are so called Beanshells which are customisable components that can define their own input and output parameters and then process them using a Java Scripting language (also external Java libraries can be used by making them available to Taverna and defining the dependency on the library for the Beanshell). The workflow has various parameters in order to configure the workflow run: The "Get_list_of_images" component has a surrounding box because it is a workflow that is used as a nested workflow in the containing workflow. Parameters of the workflow"url_to_textfile_with_image_urls", a textfile containing URL references to the images that should be processed (kind of a batch process) "csresult_regex" which is a regular expression that is used to identify the compression scheme, for example the expression .*Group 4 Fax.* is used to find the items where FITS identified the compression scheme T6/Group 4 Fax. "convert_compression" is an integer number that indicates the compression scheme that should be applied when converting the images that have been identified according to the regular expression just mentioned. More concretely, 0 means to remove the compression, and further values from 1 to 6 mean None (0), LZW (1), PACKBITS(2), DEFLATE (3), JPEG (4), CCITT G3 Fax (5), CCITT G4 Fax (6)). "convert_numcolors" is the number of colours that the target image should have. Workflow execution For the batch processing, the workflow takes a URL reference to a textfile that contains a list of URL references to the TIFF image files as input: http://<someserver>/000001.tif http://<someserver>/000002.tif http://<someserver>/000003.tif etc. Taverna's list handling then hands over these images one by one to the FITS operation "characteriseFile" which tries to identify the file format and some file properties. This means that it creates an XML description of the identification result which is based on a set of identification tools that FITS uses (FITS wraps e.g. Droid, Jhove 1 amoung others and normalizes the characterisation output). The „Read_Text_File“ component reads the XML identification result and uses an Xpath expression in order to extract the compression scheme property value: /default:fits/default:metadata/default:image/default:compressionScheme In the example setting, most of the images have the compression scheme value „uncompressed“, and some have the value „T6/Group 4 Fax“. The intention is to identify those images that have the compression scheme value „T6/Group 4 Fax“, therefore the „Beanshell“ component is used to determine the images that have this property. The Beanshell component has the characterisation results list charactres_in_list and the images list tiff_images_in_list as input and picks out those where the regular expression csresult_regex matches, e.g. the expression .*Group 4 Fax.* can be used. This is the Java code snippet that is used in the beanshell in order to filter out the „Group 4 Fax“ compressed images. List tiff_images_out_list = new ArrayList(); for( int i = 0; i < tiff_images_in_list.size(); i++) { String item = tiff_images_in_list.get(i); String charres = charactres_in_list.get(i); if(charres.matches(csresult_regex)) tiff_images_out_list.add(item); } The output list of the Beanshell component then only contains those images that have the Group 4 Fax“ compression scheme, and those images are handed over to the operation convertTIFFtoTIFFByURL which is a conversion service based on „The GIMP“ image manipulation tool. This service is configured by the convert_compression and convert_numcolors parameters. In this scenario, convert compression is set to 0 (NONE) and the number of colours is set to 2 (bitonal). The GIMP service uses a java wrapper which executes GIMP on the command line. In order to execute the command, the Java class ProcessBuilder is used which takes a string array in order to create the command. The following array of command strings is an example for a GIMP command that can be handed over to the ProcessBuilder which can then be used to execute the command. /usr/bin/gimp --verbose -c -i -d -b (convertTIFFtoTIFF "/tmp/tmpfilefromurl4002680093931603769.tmp" "/tmp/tmpfilefromurl4002680093931603769.tmp.out.tiff" 2 0) (gimp-quit 0) where /usr/bin/gimp is the gimp executable, -b is used for starting the command in batch mode, -i means that we do not require the GIMP interface, -d means that we do not need the tools. Then the "convertTIFFtoTIFF" script is called with 4 parameters, the first two being the input and output files, then the number of colours and the compression scheme to be used (0 := NONE). The JAVA wrapper cares about handing over the parameters from the workflow layer (Taverna) down to the fu-script command execution layer. Finally gimp-quit 0 exits the batch process. The following fu-script (GIMP scripting language) shows the source of the convertTIFFtoTIFF script which does the actual image conversion: ; Copyright (C) 2011 ; Author Sven Schlarb <[email protected]> ; convertTIFFtoTIFF ; infile STRING Name of file to be loaded ; outfile STRING Name of file to be saved ; num-colors INT32 Default: 256, The number of colors to quantize to ; compression INT32 Switch integer, Compression type: {None (0), LZW (1), PACKBITS(2), DEFLATE (3), JPEG (4), CCITT G3 Fax (5), CCITT G4 Fax (6)} (define (impactConvertTIFFtoTIFF infile outfile num-colors compression) (let* ((image (car (file-tiff-load 1 infile infile))) (drawable (car (gimp-image-active-drawable image))) ) ; flatten image if it has an alpha channel (if (gimp-drawable-has-alpha drawable) (set! drawable (car (gimp-image-flatten image))) ) ; only convert to indexed if the original image is not already an indexed image (if not(gimp-drawable-is-indexed drawable) (gimp-convert-indexed image ; image IMAGE The image 0 ; dither-type INT32 Dither type { NO-DITHER (0), FS-DITHER (1), FSBLOWBLEED-DITHER (2), FIXED-DITHER (3)} 0 ; palette-type INT32 Palette type { MAKE-PALETTE (0), WEB-PALETTE (2), MONO-PALETTE (3), CUSTOM-PALETTE (4)} num-colors ; num-cols INT32 Default: 256, The number of colors to quantize to FALSE ; alpha-dither INT32 Default: 0, Dither transparency to fake partial opacity, Boolean integer, 0: No, 1: Yes TRUE ; remove-unused INT32 Default: 0, Remove unused or duplicate color entries from final palette, Boolean integer, 0: No, 1: Yes "" ; palette STRING The name of the custom palette to use, ignored unless (palette_type == GIMP_CUSTOM_PALETTE) ) ) ; file-tiff-save (Saves files in tiff file format) (file-tiff-save 1 ; run-mode INT32 Interactive, non-interactive image ; image IMAGE Input image drawable ; drawable DRAWABLE Drawable to save outfile ; filename STRING file name to save outfile ; raw-filename STRING file name to save compression ; compression INT32 Compression type: {None (0), LZW (1), PACKBITS(2), DEFLATE (3), JPEG (4), CCITT G3 Fax (5), CCITT G4 Fax (6)} ) ) ) (script-fu-register "convertTIFFtoTIFF" "<Toolbox>/Xtns/Script-Fu/aqua/convertTIFFtoTIFF" "Convert TIFF to TIFF" "AQuA" "Copyright 2011" "2011-06-15" "" SF-FILENAME "Infile" "infile.tiff" SF-FILENAME "Outfile" "outfile.tiff" SF-VALUE "num-colors" "256" SF-VALUE "compression" "0" ) Note that for making scripts available to GIMP, you have to "refresh scripts" in GIMP, also if you are only using the command line, otherwise GIMP is not be aware of the new script. Finally, the operation characteriseFile uses again FITS in order to identify the conversion result in order to verify if the compression has been removed correctly. The workflow can be downloaded from myExperiment ![]() |
Solution champion |
Sven Schlarb <[email protected]> |
myExperiment link | http://www.myexperiment.org/workflows/2174![]() |
Evaluation |
|
Tool (link) | Taverna Workflow Design and Execution workbench![]() ![]() ![]() |
Issue | BOPCRIS issue - Mix of compressed and uncompressed TIFFS |
Labels: