  • Identifying the content of MS Office documents One line summary We have OLE2 Office documents, which may contain more documents, and we want to identify which version of Office each was created by. Detailed description The older binary Office document formats (OLE) are effectively file systems, and the format informati
  • EAP File Verification One line summary When media are detected, the tool will identify the selected format and identify valid / invalid / broken files Detailed description Solution for EAP Issue 1 Broken TIFFs Same tool as for Solution 3 Developed using PHP and FITS (and a bit of JQuery/UI for results page). 1. Scan sp
  • Maintain a list of metadata mappings outside of the script Title maintain a list of metadata mappings outside of the script Detailed description A PHP script invoking exiftool which returns a PHP array. This array is used to fill in an XML template, which can be edited at will. Outside metadata is contained in a .ini f
  • JJ2000 Summary Purpose Pure Java implementation of a JPEG2000 decoder Homepage Licence LGPL/Other Description This project is a JPEG 2000 encoder/decoder written in pure Java. Its goal is to be a reference implementation of (at least) part 1 of the JPEG 2000 specification. The code here
  • IS8 Diversity of office document formats in digital objects archive Title Diversity of office document formats in digital objects archive Detailed description Document instances of many different file formats are referenced in web content. Many of these formats might not be renderable in a web archive viewer in the fut
  • . !identifycompresseduncompressed2.png border=1! Notes in order to understand the diagram The green boxes are operating nodes that apply a characterisation or file format … of identification tools that FITS uses (FITS wraps e.g. Droid, Jhove 1 amoung others and normalizes the characterisation output). The „ReadTextFile“ component reads
  • AQUAdio characterization of usergenerated audio field recordings One line summary Tool to extract audio properties and metadata from audio files Detailed description AQUAdio is a wrapper script around the Open Source getID3() PHPlibrary. It extracts all possible information from audiofiles (MP3, WAV, MP4, AIFF, etc.) s
  • Audio Auditing Script {}One line summary{} A script to check a collection of audio recordings for 1) expected files, 2) expected specification, and 3) provenance. {}Detailed description{} audioaudit: Audit a series of Wave files. OPTIONS: h Show this message. i Input path (mandatory). f Perform fingerprint verification
  • 08 Characterising content in web archives with Nanite.pdf

    Characterising content in web archives with Nanite William Palmer SCAPE Information Day British Library, UK, 14th July 2014 • When web sites are harvested … ? • You name it 3 Characterisation This work was partially supported by the SCAPE Project. The SCAPE project is co‐funded by the European Union under FP7 ICT‐2009.4.1
  • Apache POI Office Document Analyser One line summary A utility based on Apache POI that is able to analyse MS Office documents. Detailed description Uses POI to walk through the OLE file structures and look for embedded objects and their properties. Solution champion anjackson Git link