Page: IS12 ARC to WARC migration (SCAPE)
Title \\ IS12 ARC to WARC migration \\ Detailed description Migration from ARC to WARC is desirable as the WARC archive is better suited for the future of web archiving.  Scalability Challenge \\ ARC and WARC are both container formats. At the present SB has ...
Page: IS7 Incompleteness and and inconsistency of web archive data (SCAPE)
Title \\ Incompleteness and/or inconsistency of web archive data \\ Detailed description The best practice in preserving websites is by crawling them using a web crawler like Heritrix. However, crawling is a process that is highly susceptible to errors. Often, essential data is missed by the crawler ...
Page: IS8 Diversity of office document formats in digital objects archive (SCAPE)
Title \\ Diversity of office document formats in digital objects archive Detailed description Document instances of many different file formats are referenced in web content. Many of these formats might not be renderable in a web archive viewer in the future. This relates especially to older ...
Page: Pagelyzer (The Registry)
Summary Purpose Tool for the web pages comparison based on structural and visual approach. Research challenge for this tool is the learning algorithm based on frequency. Homepage \\ \\ Source Code Repository \\ License \\ As Is \\ Debian Package ...
