Label: characterisation

Content with label characterisation in SPRUCE (See content from all spaces)
Related Labels: php, exiftool, parallel, processing, solution, multi, mapping, identification, cataloguing, spruce_london_2, spruce_london, keyword, metadata, fixity, xml, process, structural_relationships, bit_rot_detection, frequency

Page: Distinguishing Files with Descriptive Metadata
Distinguishing Files with Descriptive Metadata A Java program making use of a custom Apache Tika wrapper to extract file format identification and metadata from a directory of files and present aggregated data for identifying which files have full descriptive metadata ...
Other labels: spruce_london, solution, identification
Page: Extracting and aggregating metadata with Apache Tika
Extracting and aggregating metadata with Tika At the Glasgow Mashup Peter May created a Python wrapper for Apache Tika. Carl Wilson extended this work, creating a Java utility class that wrapped Tika, providing simple configuration, two types of call to Tika ...
Other labels: spruce_london, solution
Page: File Format Identification and Metadata Extraction using FITS
Title File Format Identification and Metadata Extraction using FITS Detailed description Practicioners who are new to Digital Preservation are often looking for ways to identify file format types in their collections and extract metadata from these files. The best way to get ...
Other labels: spruce_london_2, solution, identification
Page: freqy - word clouds for directories
freqy word clouds for directories Detailed description A recurring issue in SPRUCE mashups has been when presented with a load of unknown files how does anyone go about cataloging it? freqy is one way to help. It started life as a word ...
Other labels: spruce_london_2, solution, keyword, frequency, cataloguing
Page: Identifying differences between metadata in files and copying metadata between files
Title Identifying differences between metadata in files and copying metadata between files Detailed description Original TIFF files are stored as master files and JPEG files are produced from them as access copies. Curators manually modify the access JPEG ...
Other labels: spruce_london_2, solution, structural_relationships
Page: Maintain a list of metadata mappings outside of the script
Title maintain a list of metadata mappings outside of the script Detailed description A PHP script invoking exiftool which returns a PHP array. This array is used to fill in an XML template, which can be edited at will. Outside metadata is contained in a .ini ...
Other labels: spruce_london, solution, metadata, xml, php, exiftool, mapping
Page: Parallel processing of identification and characterisation jobs
Here's some ideas and suggestions regarding parallel processing, focusing on running identification and characterisation jobs in parallel. Please contribute and comment! techmaurice / 30012012 IMHO we should also give multi/parallel processing more attention. Most (nonJava ...
Other labels: parallel, processing, multi, process, identification
Page: Solving TIFF malformation using exiftool
Title Solving TIFF malformation using exiftool Detailed description The issue page http://wiki.opflabs.org/display/SPR/ValidandwellformedTIFF%27swithscanlinec orruption describes the problem as (essentially): TIFF files being unusable, despite being "validated" by tools like JHOVE. Solution ...
Other labels: spruce_london, solution, bit_rot_detection
Page: Using Perl to write scripts for reporting on the content of the collection
Title Using Perl to write scripts to find duplicates for reporting on the content of the collection. Perl was used to write scripts that used the metadata that was extracted using Apache Tika SPR:Extracting and aggregating metadata with Apache Tika to help locate duplicates and different ...
Other labels: spruce_london, solution, fixity