Label: identification+webarchive

All content with label identification+webarchive.
Related Labels: hadoop, validation, spruce_glasgow, representationinformation, watch, tool, qa, fixity, fu-script, image, signatures, quality_assurance, scenario, fits, solution, multi, development, obsolescence, guide, more » ( - identification, - webarchive )

Page: IS14 Diverse preservation risks in large archives with millions of objects (SCAPE)
Title \\ IS14 Diverse preservation risks in large archives with millions of objects Detailed description While we ingested millions of objects in the past, we expanded our knowledge about the risks of the objects. However, before we could make a decision ...
Other labels: characterisation, issue, watch, obsolescence
Page: IS17 Characterisation of text-based formats (SCAPE)
Title \\ IS17 Characterisation of textbased formats Detailed description Problem: it is getting increasingly common that scientific journal articles (which are usually in PDF format) are accompanied by supplemental files. These are often research data, or software source code or scripts. In the majority of cases ...
Other labels: lsdr, issue, unknown_file_formats
Page: IS25 Web Content Characterisation (SCAPE)
Title \\ IS25 Web Content Characterisation Detailed description \\ The issue with web content is mainly the fact that web archive data is very heterogeneous. Depending on the policy of the institution, data contains text documents in all kinds of text encoding, html content ...
Other labels: characterisation, issue, obsolescence
Page: IS26 Dealing with difficult identification cases (SCAPE)
Title \\ Dealing with difficult identification cases \\ Detailed description Identification Requirements, Format Languages, Requirements and Difficult Cases. Mutants and wild types. Strains. See below for specific examples. Scalability Challenge \\ The solution must be able to identify and describe the large ...
Other labels: lsdr, issue, unknown_file_formats
Page: IS5 Digital objects archive contains unidentified content (SCAPE)
Title \\ Digital objects archive contains unidentified content Detailed description From an archiving point of view, if there is no detailed information about the exact content of an archive, no preservation planning or any preservation actions can be undertaken. For example, if old ...
Other labels: characterisation, issue, watch, obsolescence
Page: IS7 Incompleteness and and inconsistency of web archive data (SCAPE)
Title \\ Incompleteness and/or inconsistency of web archive data \\ Detailed description The best practice in preserving websites is by crawling them using a web crawler like Heritrix. However, crawling is a process that is highly susceptible to errors. Often, essential data is missed by the crawler ...
Other labels: characterisation, qa, issue, watch
Page: IS8 Diversity of office document formats in digital objects archive (SCAPE)
Title \\ Diversity of office document formats in digital objects archive Detailed description Document instances of many different file formats are referenced in web content. Many of these formats might not be renderable in a web archive viewer in the future. This relates especially to older ...
Other labels: characterisation, qa, watch, planning, issue, obsolescence
Page: UK Web Domain Dataset - Format Profile (SCAPE)
Title \\ British Library UK Web Domain Dataset: Format Profile \\ Description MIME type records have been created for the UK Web Domain Dataset, using three sources/tools: \\ the MIME types delivered by the server  Apache Tika ...
Other labels: dataset, formatprofile, representationinformation, researchdata