Label: york_hackathon

Content with label york_hackathon in Practical Preservation Issues (See content from all spaces)
Related Labels: solution, spruce_glasgow, convert, email, embedded_objects, identification, bit_rot, vector, obsolescence, duplication, issue, appraisal_assessment, data_capture, spruce, qa, harvesting, characterise, fonts, unknown_file_formats, more »

Page: Ability to automatically identify script files
Title \\ Ability to automatically identify script files Detailed description It is necessary to provide accurate and automated identifications for various file format types in order to effectively manage and preserve such objects. At present, many of the file format identification tools do not identify ...
Other labels: issue, identification, unknown_file_formats
Page: Automatically extracting metadata for Grey Literature reports
Title \\ Automatically extracting metadata for Grey Literature reports Detailed description Sometimes we receive batches of grey literature reports for which we don't have any metadata. This means we can not include them in the grey literature library ...
Other labels: issue, unknown_characteristics, appraisal_assessment
Page: Checking that significant properties are preserved after migration
Title \\ Checking that significant properties are preserved after migration Detailed description AFter a file migration (from pdf to pdf/a, doc to docx, doc to pdf/a) we should check that the conversion has been successful and that the significant properties of the object are maintained ...
Other labels: issue, qa
Page: Convert embedded fonts to outlines
I have an EPS file that contains text in a particular, unusual font. The font has been embedded, so the rendering of the EPS looks fine. I can also convert easily to PDF or PS, keeping the font embedded along ...
Other labels: convert, postscript, fonts, vector, solution, migration, rights
Page: Deduplication
Title \\ Deduplication \\ Detailed description Collection owners need a way to easily identify duplicates in a collections.  Duplicates are a common and seemingly simple issue but the fact that it is rarely cracked illustrates the complexity. A collection of several hundred it may be possible to identify manually ...
Other labels: issue, duplication
Page: Extracting embedded objects from docx files
Title \\ Extracting embedded objects from docx files Detailed description We preserve MS Word documents as docx files. We are reasonably confident that the XML structure preserves the report text and structure well. We are not so confident about ...
Other labels: issue, embedded_objects
Page: Identifying content and Sorting
Title \\ Identifying and content and Sorting Detailed description A detailed description of the Issue. The Issue {}MUST{} focus on the busines or preservation driven challenge, and should not assume or describe a particular solution. \\ Gathering information about content, including file identifcation and metadata, to inform ...
Other labels: issue, obsolescence, unknown_characteristics, appraisal_assessment
Page: Identifying web content
Title \\ Identifying web content Detailed description The Web archives team at BnF has long suspected that the MIME types declared by the server for their pages' content were not accurate, which is a problem for preservation and could impede future emulation, for instance.Thus, when ...
Other labels: issue, identification, unknown_characteristics
Page: PDF to PDF-A conversion
Title \\ PDF to PDFA conversion Detailed description The process of converting pdf files to pdf/a is one of our most timeconsuming tasks. It is frustrating and not always successful. Also, the process can not be run as a batch process. The conversion often fails (most common ...
Other labels: issue, qa
Page: Permission Overlays
Title \\ Permission Overlays Detailed description Disk images often contain information to which they should apply access restrictions.  This can include deleted files, file fragments, Windows registry hives, and various system files.  This particular ...
Other labels: issue, rights, permissions