 Page: Ability to automatically identify script files Title \\ Ability to automatically identify script files Detailed description It is necessary to provide accurate and automated identifications for various file format types in order to effectively manage and preserve such objects. At present, many of the file format identification tools do not identify ... Other labels: york_hackathon, identification, unknown_file_formats Page: Analyzing a disk image of a 12-year old laptop Title \\ Analyzing a disk image of a 12year old laptop \\ Detailed description This laptop belonged to an academic researcher and contains a variety of files/file formats that mix professional work and personal information. Third party privacy issues exist with respect ... Other labels: opf, appraisal_assessment Page: Automatically extracting metadata for Grey Literature reports Title \\ Automatically extracting metadata for Grey Literature reports Detailed description Sometimes we receive batches of grey literature reports for which we don't have any metadata. This means we can not include them in the grey literature library ... Other labels: york_hackathon, unknown_characteristics, appraisal_assessment Page: Check content of e-pub against digitized book Title \\ Check content of epub against digitized book Detailed description The French National Library has begun creating EPUBs from its digitized books. They are also available as images (TIFF masters, JPEG dissemination) and increasingly as full text ... Other labels: opf_montpellier, qa Page: Checking that significant properties are preserved after migration Title \\ Checking that significant properties are preserved after migration Detailed description AFter a file migration (from pdf to pdf/a, doc to docx, doc to pdf/a) we should check that the conversion has been successful and that the significant properties of the object are maintained ... Other labels: york_hackathon, qa Page: Common validation error messages from PDF to PDFA conversion Title \\ Common validation error messages from PDF to PDFA conversion \\ Detailed description The validation error messages found from two pieces of validation software when performed on the same \file: \\ From PDFBox Preflight: \\ Invalid font definition ... Other labels: untagged, unsolved_issue, opf, preservingpdf Page: Data Extraction from real world Android Phone Images through BW-FLA Emulation as a service Title \\ Data Extraction from real world Android Phone Images through BWFLA Emulation as a service. The challenge is to extracting images from real Android phones (as the Nexus series devices) and mount them in the BWFLA emulation environment. Based on this setup ... Other labels: unsolved_issue, opf, hackathon, android, emulation, database, preservation, data_capture Page: Decoding JP2 with OpenJPEG goes wrong in case of embedded ICC profiles Summary Decoding a JP2 that contains an embedded ICC profile with OpenJPEG 1.5 (commandline tool) results in a degraded output image. Original test data here testOpenJPEGJvdK27082012.zip . Source image We're starting with the image below, which is a losslessly compressed ... Other labels: jp2, unsolved_issue, opf Page: Deduplication Title \\ Deduplication \\ Detailed description Collection owners need a way to easily identify duplicates in a collections.  Duplicates are a common and seemingly simple issue but the fact that it is rarely cracked illustrates the complexity. A collection of several hundred it may be possible to identify manually ... Other labels: york_hackathon, duplication Page: E-mail Threads - relinking the conversation Title Email Threads relinking the conversation \\ Detailed description Email Threads (RE: / FW\:) in a mailbox are technical linked through an ID with each other as long as people follow a "good email protocol".  \\ The problem starts when people ... Other labels: opf_montpellier, context, structural_relationships
