Label: york_hackathon

All content with label york_hackathon.
Related Labels: solution, spruce_glasgow, convert, email, embedded_objects, identification, bit_rot, obsolescence, vector, duplication, data_capture, issue, appraisal_assessment, spruce, qa, harvesting, characterise, fonts, unknown_file_formats, more »

Page: Ability to automatically identify script files (Practical Preservation Issues)
Title \\ Ability to automatically identify script files Detailed description It is necessary to provide accurate and automated identifications for various file format types in order to effectively manage and preserve such objects. At present, many of the file format identification tools do not identify ...
Other labels: issue, identification, unknown_file_formats
Page: Automatically extracting metadata for Grey Literature reports (Practical Preservation Issues)
Title \\ Automatically extracting metadata for Grey Literature reports Detailed description Sometimes we receive batches of grey literature reports for which we don't have any metadata. This means we can not include them in the grey literature library ...
Other labels: issue, unknown_characteristics, appraisal_assessment
Page: Checking that significant properties are preserved after migration (Practical Preservation Issues)
Title \\ Checking that significant properties are preserved after migration Detailed description AFter a file migration (from pdf to pdf/a, doc to docx, doc to pdf/a) we should check that the conversion has been successful and that the significant properties of the object are maintained ...
Other labels: issue, qa
Page: Convert embedded fonts to outlines (Practical Preservation Issues)
I have an EPS file that contains text in a particular, unusual font. The font has been embedded, so the rendering of the EPS looks fine. I can also convert easily to PDF or PS, keeping the font embedded along ...
Other labels: convert, postscript, fonts, vector, solution, migration, rights
Page: Deduplication (Practical Preservation Issues)
Title \\ Deduplication \\ Detailed description Collection owners need a way to easily identify duplicates in a collections.  Duplicates are a common and seemingly simple issue but the fact that it is rarely cracked illustrates the complexity. A collection of several hundred it may be possible to identify manually ...
Other labels: issue, duplication
Page: Extracting embedded objects from docx files (Practical Preservation Issues)
Title \\ Extracting embedded objects from docx files Detailed description We preserve MS Word documents as docx files. We are reasonably confident that the XML structure preserves the report text and structure well. We are not so confident about ...
Other labels: issue, embedded_objects
Page: Identifying content and Sorting (Practical Preservation Issues)
Title \\ Identifying and content and Sorting Detailed description A detailed description of the Issue. The Issue {}MUST{} focus on the busines or preservation driven challenge, and should not assume or describe a particular solution. \\ Gathering information about content, including file identifcation and metadata, to inform ...
Other labels: issue, obsolescence, unknown_characteristics, appraisal_assessment
Page: Identifying web content (Practical Preservation Issues)
Title \\ Identifying web content Detailed description The Web archives team at BnF has long suspected that the MIME types declared by the server for their pages' content were not accurate, which is a problem for preservation and could impede future emulation, for instance.Thus, when ...
Other labels: issue, identification, unknown_characteristics
Page: PDF to PDF-A conversion (Practical Preservation Issues)
Title \\ PDF to PDFA conversion Detailed description The process of converting pdf files to pdf/a is one of our most timeconsuming tasks. It is frustrating and not always successful. Also, the process can not be run as a batch process. The conversion often fails (most common ...
Other labels: issue, qa
Page: Permission Overlays (Practical Preservation Issues)
Title \\ Permission Overlays Detailed description Disk images often contain information to which they should apply access restrictions.  This can include deleted files, file fragments, Windows registry hives, and various system files.  This particular ...
Other labels: issue, rights, permissions