Label: identification

Content with label identification in SCAPE (See content from all spaces)
Related Labels: planning, solution, lsdr, representationinformation, characterisation, watch, obsolescence, issue, qa, formatprofile, webarchive, researchdata, unknown_file_formats, dataset

Page: IS14 Diverse preservation risks in large archives with millions of objects
Title \\ IS14 Diverse preservation risks in large archives with millions of objects Detailed description While we ingested millions of objects in the past, we expanded our knowledge about the risks of the objects. However, before we could make a decision ...
Other labels: webarchive, characterisation, issue, watch, obsolescence
Page: IS17 Characterisation of text-based formats
Title \\ IS17 Characterisation of textbased formats Detailed description Problem: it is getting increasingly common that scientific journal articles (which are usually in PDF format) are accompanied by supplemental files. These are often research data, or software source code or scripts. In the majority of cases ...
Other labels: lsdr, webarchive, issue, unknown_file_formats
Page: IS22 Characterise and Validate very large mpeg-1 and mpeg-2 files
Title \\ IS22 Characterise and Validate very large mpeg1 and mpeg2 files Detailed description Collections of very large videofiles (50Gb\ each) are hard to handle when it comes to characterisation and validation. Known characterisation tools do not nessecarily like very ...
Other labels: characterisation, lsdr, issue, obsolescence
Page: IS25 Web Content Characterisation
Title \\ IS25 Web Content Characterisation Detailed description \\ The issue with web content is mainly the fact that web archive data is very heterogeneous. Depending on the policy of the institution, data contains text documents in all kinds of text encoding, html content ...
Other labels: characterisation, webarchive, issue, obsolescence
Page: IS26 Dealing with difficult identification cases
Title \\ Dealing with difficult identification cases \\ Detailed description Identification Requirements, Format Languages, Requirements and Difficult Cases. Mutants and wild types. Strains. See below for specific examples. Scalability Challenge \\ The solution must be able to identify and describe the large ...
Other labels: webarchive, lsdr, issue, unknown_file_formats
Page: IS5 Digital objects archive contains unidentified content
Title \\ Digital objects archive contains unidentified content Detailed description From an archiving point of view, if there is no detailed information about the exact content of an archive, no preservation planning or any preservation actions can be undertaken. For example, if old ...
Other labels: characterisation, webarchive, issue, watch, obsolescence
Page: IS7 Incompleteness and and inconsistency of web archive data
Title \\ Incompleteness and/or inconsistency of web archive data \\ Detailed description The best practice in preserving websites is by crawling them using a web crawler like Heritrix. However, crawling is a process that is highly susceptible to errors. Often, essential data is missed by the crawler ...
Other labels: webarchive, characterisation, qa, issue, watch
Page: IS8 Diversity of office document formats in digital objects archive
Title \\ Diversity of office document formats in digital objects archive Detailed description Document instances of many different file formats are referenced in web content. Many of these formats might not be renderable in a web archive viewer in the future. This relates especially to older ...
Other labels: webarchive, characterisation, qa, watch, planning, issue, obsolescence
Page: SO17 Web Archive Mime-Type detection workflow based on Droid and Apache Tika
Title SO17 Web Archive MimeType detection workflow based on Droid and Apache Tika Detailed description An experimental workflow has been implemented using Taverna Workbench. Due to the large amount of local data to be processed, the workflow is using ...
Other labels: solution
Page: UK Web Domain Dataset - Format Profile
Title \\ British Library UK Web Domain Dataset: Format Profile \\ Description MIME type records have been created for the UK Web Domain Dataset, using three sources/tools: \\ the MIME types delivered by the server  Apache Tika ...
Other labels: dataset, webarchive, formatprofile, representationinformation, researchdata