View Source

h1. Task details

Current tools developed in other digital preservation projects like PLANETS and CRiB do not cope well with all
aspects of large-scale processing. Fault tolerance and workflow provenance, for example, must be part of what
the tools in SCAPE must offer. This task will adapt existing tools (identified in Task 1 of the WP10) to be compatible with the
large-scale characteristics of the SCAPE platform.

h1. Tool wrapper approach

Based on the knowledge of the PT characteristics, a decision was made to go for a general approach based on the toolwrapper. This way, tools are described using a machine-readable language (XML, respecting a XML schema) and from that same description different outputs can be generated to be used in different contexts, like command-line or web services. This is achieved using automated tools (scripts and others).

New toolspec schema: [https://github.com/openplanets/scape/blob/master/doc/WP.02.XA.Technical.Coordination/toolspec/tool-1.0_draft.xsd]



New version of the toolwrapper (only bash wrapper for now): [https://github.com/openplanets/scape/tree/master/pc-as/toolwrapper/]



h2. Toolwrapper outputs

For now, only bash wrappers are generated. These are inside a Debian package, containing the bash wrapper, a man page and a single step Taverna workflow.


h1. How to install tools

{info}These installation notes were tested on Debian 6.0.5{info}
*1)* Add KEEPS debian package repository (final version of the tools will be added also to the OPF debian repository):
{code:language=bash}$ sudo -E wget --output-document=/etc/apt/sources.list.d/scape.keep.pt.list http://scape.keep.pt/apt/stable.list && wget -q http://scape.keep.pt/apt/rep.key -O- | sudo apt-key add -{code}
*2)* Add debian-multimedia repository (needed to install handbrake-cli):
{code:language=bash}$ echo "deb http://www.deb-multimedia.org squeeze main non-free" | sudo tee /etc/apt/sources.list.d/deb-multimedia.list{code}

*3)* Update the list of packages known by apt (to add the packages from the recently added repositories):

{code:language=bash}$ sudo apt-get --quiet 2 update{code}

*4)* Install all migration tools (using a metapackage):
{code:language=bash}$ sudo apt-get install digital-preservation-tools-migration{code}

*5)* See what migration tools have been installed:

Option 1 (use bash completion feature or other shell-like functionality):
{code:language=bash}$ digital-preservation-migration (and issue a TAB keystroke){code}

Option 2 (use apt-cache):
{code:language=bash}$ sudo apt-cache show digital-preservation-migration-* | egrep "^(Package|Description|\s)"{code}


h1. Usage examples

*1)* Create a PDF containing the sentence "SCAPE project":

{code:language=bash}$ echo "SCAPE project" > file.txt
$ digital-preservation-migration-office-pdfbox-txt2pdf -i file.txt -o out.pdf{code}
*2)* Create a PDF containing the sentence "SCAPE project" with a single command using pipes:
{code:language=bash}$ echo "SCAPE project" | digital-preservation-migration-office-pdfbox-txt2pdf -i STDIN -o out.pdf{code}
*3)* Test the identity funcion using PDFBox, i.e., create a PDF from the sentence "SCAPE project" and convert the output (PDF) back into text in order to compare the original text and the text extracted from the PDF created:
{code}$ echo -n "SCAPE project" | digital-preservation-migration-office-pdfbox-txt2pdf -i STDIN -o STDOUT | digital-preservation-migration-office-pdfbox-pdf2txt -i STDIN -o STDOUT{code}
´╗┐Note: Comparing the original text and the output of the execution of the previous command, we may say that the sentences are equal. But the true is they aren't. If we pipe the result to an "od \-c" we see that a space and a line ending were added, in this case by PDFBox while converting text to PDF.


h1. Workflows location

Workflows are located in */usr/share/doc/*, one per folder with the exact name of the Debian package.

Therefore, and for an action tool called *digital-preservation-migration-office-abiword-doc2html*, the workflow can be retrieved issuing the following shell command:
{code:language=bash}$ cat /usr/share/doc/digital-preservation-migration-office-abiword-doc2html/digital-preservation-migration-office-abiword-doc2html_bash.t2flow.gz | gunzip > digital-preservation-migration-office-abiword-doc2html_bash.t2flow{code}

h1. Future work

* Develop script that based on the toolspecs publishes tools on the SCAPE component catalogue.
* Improve wrapped tools by adding more parameters (only a few set of tools allow parameters because they weren't described in the tool description).
* Wrap more action tools, especially, CC and QA tools.
* Make gap analysis to obtain metrics to assess the need of new tools to be improved.