View Source

h1. Resources

h2. GoPortis Github project

h2. Installation of VirtualBox

{color:#222222}In order to check that your VirtualBox and Vagrant installs are working please open a terminal (command line interface) in an empty directory and type:{color}

{code}vagrant init ubuntu/precise32{code}

You should see:
{code}A `Vagrantfile` has been placed in this directory. You are now ready to `vagrant up` your first virtual environment! Please read the comments in the Vagrantfile as well as documentation on`` for more information on using Vagrant.{code}

If so try typing:

{code}vagrant up{code}

This should start up a virtual machine image, it will take a minute or two and the output should start:

{code}Bringing machine 'default' up with 'virtualbox' provider...
==> default: Importing base box 'ubuntu/precise32'...{code}

Once it's finished to test that it's worked try:

{code}vagrant ssh
ls /vagrant{code}

which should give the output:

{code}[email protected]:~$ ls /vagrant
[email protected]:~${code}

If that's the case tidy up by typing:

vagrant halt
vagrant destroy{code}

If anything seems to go amiss feel free to contact our Technical Lead: carl [at] openplanetsfoundation [dot] org

h2. Software

* [JHOVE |] Bespoke PDF Module used by DP Community.
* [Apache Tika |] Open Source characterisation / content extraction tool.
* [Apache PDF Box |] The Open Source PDF parsing library that powers [Apache Tika |].
* [pdfeh |] PDF Box preflight functionality wrapping.
* [pdf-preflight |] A Ruby pre-flight project on GitHub.

h1. Ideas

Some ideas which kind of tools as an output can be useful to build during the Hackathon.

* Create a scalable test if the PDF file can be opened by the Acrobat reader by using the (but I guess that is not open source) [PDF Library |]
* Create a scalable comparison workflow by converting both PDF files (original and new representation) to images and compare via e. g. matchbox tool if there are visible difference
* Idea Andres/Slub about Repair PDF + QA: Save all the PDF objects like e. g. streams, hashmaps, strings, floats in a list and save the MD5 checksum. Reapir the structures of the PDF and put all the objects back. MD5 should not have changed.