Skip to end of metadata
Go to start of metadata
Title
Austrian National Library - Web Archive
Description The Austrian National Library uses a representative datasets from their webarchive:
- events selective crawls: during an event frequently harvested sites, e.g. EU election 2009, Olympia 2010, 
- domain crawls 2009 from about 1 million domains.

The web archive data is available in the ARC.GZ format.
The size of the ARC.GZ data set is 1377GB.

The metadata log file produced during the crawl process is available as txt file and has a size of 197GB.
Licensing Sample only available to SCAPE partners.
Owner Austrian National Library (ONB)
Collection expert Prändl-Zika Veronika (ONB)
Issues brainstorm  
List of Issues IS25 Web Content Characterisation
IS41 Analyse huge text files containing information about a web archive
Labels:
arc arc Delete
webarchive webarchive Delete
Enter labels to add to this page:
Please wait 
Looking for a label? Just start typing.