AQuA Mashup Tool List

Skip to end of metadata
Go to start of metadata
You are viewing an old version of this page. View the current version. Compare with Current  |   View Page History

List of Tools

The following list includes all of the tools examined or used by participants in the AQuA Mashups. You can either:

  • Add a brief tool description to the table on this page, or
  • Add a richer entry to the The Registry (this is almost as quick, but much better!)
Tool Name                URL                                          
Description / Use                                                                                        
Link to AQuA solution where the tool is used
File Utility
Open Source file format identification utility, written in C and packaged with every unix like distribution.
Bugs: and also
Sanselan From site:  This Pure-Java library reads and writes a variety of image formats, including fast parsing of image info (size, color space, icc profile, etc.) and metadata. Might be handy! :-)
tiff2RDF - visualising image collection consistency
No doubt you all know about this, but I'm just adding them as I find them! :-) Extracts technical metadata from a number of file formats including images and audio!
JHOVE Extracts properties from files and attempts to validate against format spec. Supports AIFF ASCII GIF HTML JPEG JPEG2000 PDF TIFF UTF-8 WAVE XML. jp2 header analysis

JHOVE2 Successor to JHOVE. Integrates DROID. Supports ICC NetCDF SGML Shapefile TIFF UTF-8 WAVE XML.  
DROID Identifies files based on internal 'magic' signatures, or file extension. Notes if these are inconsistent. GUI.
Metadata/properties extraction (and editing) tool that supports dozens of formats, with an emphasis on image formats. Might well be the best properties extraction tool in existence, but strangely ignored by most of the digital preservation community ....
PSTViewTool Open source projects from Microsoft, a PST viewer and the underlying PST access library (C++)
libPST PST manipulation/migration library. See JvdK's comment here.

Unpaper For post-processing scans. Can spot rotation, black marks, etc. and so may be diagnostic use.  
b2x Translator doc/ppt/xls to docx/pptx/xlsx conversion tools from a Microsoft partner.
Open source Java deskewing library
Email message to XML file extractor for digital preservation created by the Persistent Digital Archives and Library System (PeDALS) research project  
FITS The File Information Tool Set (FITS) identifies, validates, and extracts technical metadata for various file formats. It wraps several third-party open source tools, normalizes and consolidates their output, and reports any errors. Includes JHOVE, DROID, file, and others.
EAP File Verification
Identify compressed TIFFs and convert them to uncompressed TIFFs
tiff2RDF - visualising image collection consistency

ODF Converter The goal for this project is to provide translators to allow for interoperability between applications based on ODF (OpenDocument) standards (currently ODF 1.1) and Microsoft OpenXML based Office applications. ... Along with the add-ins for Microsoft Word, Excel and PowerPoint, we also provide a command line translator that allows doing batch conversions. These translators can also be run on the server side for certain scenarios.  
ODF Toolkit
Includes a validator. Mostly Java with some .Net code too.
See also  
PDFSSA4MET PDF Structure and Syntactic Analysis for Metadata Extraction and Tagging. PDFSSA4MET attempts to provide metadata extraction and tagging based on structural and syntactic analysis of content in XML.  
JODConverter JODConverter, the Java OpenDocument Converter, converts documents between different office formats.
It leverages, which provides arguably the best import/export filters for OpenDocument and Microsoft Office formats available today.
Email Preservation Parser    
pHash The open source perceptual hash library. A perceptual hash is a fingerprint of a multimedia file derived from various features from its content. Unlike cryptographic hash functions which rely on the avalanche effect of small changes in input leading to drastic changes in the output, perceptual hashes are "close" to one another if the features are similar.
See also
Identifying rotated, duplicate images using pHash
Fiji is an image processing package. It can be described as a distribution of ImageJ together with Java, Java 3D and a lot of plugins organized into a coherent menu structure.
See also,
getID3() is a PHP library that extracts useful information from MP3s & other multimedia file formats AQUAdio - characterization of user-generated audio field recordings
The GIMP GIMP is the GNU Image Manipulation Program. Identify compressed TIFFs and convert them to uncompressed TIFFs
Taverna Taverna is an open source Workflow Management System. It consists of a suite of tools used to design and execute scientific workflows.  
Cue A small Java library for simple text analysis - counting strings, identifying languages, and removing stop words. Used in futureArch's very simple word cloud generation.
Apache Tika
The Apache Tika™ toolkit detects and extracts metadata and structured text content from various documents using existing parser libraries.
Characterising Externally Generated Content
AQDC - Document Compare
Apache Lucene
Apache Lucene(TM) is a high-performance, full-featured text search engine library written entirely in Java. It is a technology suitable for nearly any application that requires full-text search, especially cross-platform.
Characterising Externally Generated Content
Analysis of Lucene Index Word Frequency
Java Image Comparison
Basic image comparison for duplication/differences based on block by block comparison
java image blocks comparison
BWF MetaEdit
For extracting file-specific metadata (sample rate, sample bit-rate). Audio Auditing Script
For audio fingerprinting (which also relies on SoX). Both client and server software are required. Audio Auditing Script
JPEG2000 software framework
jp2 header analysis
ssdeep is a program for computing context triggered piecewise hashes (CTPH). Also called fuzzy hashes, CTPH can match inputs that have homologies. Such inputs have sequences of identical bytes in the same order, although bytes in between these sequences may be different in both content and length. ssdeep uses a rolling hash algorithm, hence changes to the file will result in only localized changes in the CTPH signature. ssdeep for duplicate image detection
pdiff: Perceptual Image Difference utility
Image comparison/differencing tool
Perceptual Image Diff comparison
Bitmap image software suite
tiff2RDF - visualising image collection consistency
The OpenJPEG library is an open-source JPEG 2000 codec written in C language. It has been developed in order to promote the use of JPEG 2000, the new still-image compression standard from the Joint Photographic Experts Group (JPEG). Validating TIFF to JPEG2000 migration
Compare OCR results of the same source material in different formats (TIFF, JP2)
Apache POI
Java APIs for manipulating various file formats based upon the Office Open XML standards (OOXML) and Microsoft's OLE 2 Compound Document format (OLE2). Apache POI Office Document Analyser
OCR engine
Compare OCR results of the same source material in different formats (TIFF, JP2)
JAVA PDF library for creation, manipulation and content extraction of PDF documents Detect, extract and analyse embedded objects in PDFs
PDF Characterisation Tool
PDF library for manipulation, content extraction and creation
PDF Characterisation Tool


Enter labels to add to this page:
Please wait 
Looking for a label? Just start typing.