Label: characterisation+webarchive

All content with label characterisation+webarchive.
Related Labels: sox, word, xena, 3gpp, jpg, jhears, qa, fu-script, audio, quality_assurance, gif, scenario, video, obsolescence, keyword, arc, apache, apache2, api, more » ( - characterisation, - webarchive )

Page: IS14 Diverse preservation risks in large archives with millions of objects (SCAPE)
Title \\ IS14 Diverse preservation risks in large archives with millions of objects Detailed description While we ingested millions of objects in the past, we expanded our knowledge about the risks of the objects. However, before we could make a decision ...
Other labels: identification, issue, watch, obsolescence
Page: IS25 Web Content Characterisation (SCAPE)
Title \\ IS25 Web Content Characterisation Detailed description \\ The issue with web content is mainly the fact that web archive data is very heterogeneous. Depending on the policy of the institution, data contains text documents in all kinds of text encoding, html content ...
Other labels: identification, issue, obsolescence
Page: IS41 Analyse huge text files containing information about a web archive (SCAPE)
Title \\ IS41 Analyse huge text files containing information about a web archive \\ Detailed description Some web archive produce information about the content of a web archive on a periodical basis. The result is sometimes stored as huge text files ...
Other labels: issue, hadoop, unknown_characteristics
Page: IS5 Digital objects archive contains unidentified content (SCAPE)
Title \\ Digital objects archive contains unidentified content Detailed description From an archiving point of view, if there is no detailed information about the exact content of an archive, no preservation planning or any preservation actions can be undertaken. For example, if old ...
Other labels: identification, issue, watch, obsolescence
Page: IS6 Determine render-ability of displayable web objects (SCAPE)
Title \\ Determine renderability of displayable web objects Detailed description To make a digital object renderable depends on standards, agreements, and understandings in interfaces and hardware, and there are strong interdependencies between these conditions. Because of these technical dependencies, the content of the web archive might not be renderable ...
Other labels: issue, obsolescence
Page: IS7 Incompleteness and and inconsistency of web archive data (SCAPE)
Title \\ Incompleteness and/or inconsistency of web archive data \\ Detailed description The best practice in preserving websites is by crawling them using a web crawler like Heritrix. However, crawling is a process that is highly susceptible to errors. Often, essential data is missed by the crawler ...
Other labels: qa, identification, issue, watch
Page: IS8 Diversity of office document formats in digital objects archive (SCAPE)
Title \\ Diversity of office document formats in digital objects archive Detailed description Document instances of many different file formats are referenced in web content. Many of these formats might not be renderable in a web archive viewer in the future. This relates especially to older ...
Other labels: identification, qa, watch, planning, issue, obsolescence
Page: Pagelyzer (The Registry)
Summary Purpose Tool for the web pages comparison based on structural and visual approach. Research challenge for this tool is the learning algorithm based on frequency. Homepage \\ \\ Source Code Repository \\ https://github.com/openplanets/pagelyzer License \\ As Is \\ Debian Package http://deb.openplanetsfoundation.org ...
Other labels: qa, tool
Page: WCT8 Huge text file analysis using hadoop (SCAPE)
Collection: Issue: Solutions
Other labels: scenario, hadoop