Label: webarchive

All content with label webarchive.
Related Labels: planning, hadoop, lsdr, representationinformation, characterisation, watch, identification, obsolescence, issue, tool, qa, formatprofile, arc, database, researchdata, unknown_characteristics, unknown_file_formats, dataset, scenario

Page: IS7 Incompleteness and and inconsistency of web archive data (SCAPE)
Title \\ Incompleteness and/or inconsistency of web archive data \\ Detailed description The best practice in preserving websites is by crawling them using a web crawler like Heritrix. However, crawling is a process that is highly susceptible to errors. Often, essential data is missed by the crawler ...
Other labels: characterisation, qa, identification, issue, watch
Page: IS8 Diversity of office document formats in digital objects archive (SCAPE)
Title \\ Diversity of office document formats in digital objects archive Detailed description Document instances of many different file formats are referenced in web content. Many of these formats might not be renderable in a web archive viewer in the future. This relates especially to older ...
Other labels: characterisation, identification, qa, watch, planning, issue, obsolescence
Page: Moved to UserStories… WCT3 Characterise web content in ARC and WARC containers at State and University Library Denmark (SCAPE)
nbsp;Moved to https://sbprojects.statsbiblioteket.dk/jira/secure/IssueNavigator.j spa?mode=hide&requestId=10829 Collection: Issue: Solutions
Other labels: scenario
Page: Pagelyzer (The Registry)
Summary Purpose Tool for the web pages comparison based on structural and visual approach. Research challenge for this tool is the learning algorithm based on frequency. Homepage \\ \\ Source Code Repository \\ https://github.com/openplanets/pagelyzer License \\ As Is \\ Debian Package http://deb.openplanetsfoundation.org ...
Other labels: characterisation, qa, tool
Page: UK Web Domain Dataset - Format Profile (SCAPE)
Title \\ British Library UK Web Domain Dataset: Format Profile \\ Description MIME type records have been created for the UK Web Domain Dataset, using three sources/tools: \\ the MIME types delivered by the server  Apache Tika ...
Other labels: dataset, formatprofile, representationinformation, identification, researchdata
Page: WCT1 Comparison of Web Archive pages (SCAPE)
Dataset: Issue: Solutions
Other labels: scenario
Page: WCT2 ARC to WARC migration (SCAPE)
Collection: Issue: Solutions
Other labels: scenario
Page: WCT4 Web Archive Mime-Type detection at Austrian National Library (SCAPE)
Collection: Issue: Solutions
Other labels: scenario
Page: WCT6 (W)ARC to HBase migration (SCAPE)
Collection: Issue: Solutions
Other labels: scenario
Page: WCT7 Format obsolescence detection (SCAPE)
Collection: Issue: Solutions
Other labels: scenario