View Source

| *Title* \\ | Austrian National Library - Web Archive |
| *Description* | The Austrian National Library uses a representative datasets from their webarchive: \\
\- events selective crawls: during an event frequently harvested sites, e.g. EU election 2009, Olympia 2010,  \\
\- domain crawls 2009 from about 1 million domains. \\
\\
The web archive data is available in the ARC.GZ format. \\
The size of the ARC.GZ data set is 1377GB. \\
\\
The metadata log file produced during the crawl process is available as txt file and has a size of 197GB. \\ |
| *Licensing* | Sample only available to SCAPE partners. \\ |
| *Owner* | Austrian National Library (ONB) \\ |
| *Collection expert* | [Prändl-Zika Veronika|https://portal.ait.ac.at/sites/Scape/TB/_layouts/userdisp.aspx?ID=92] (ONB) \\ |
| *Issues brainstorm* | |
| *List of Issues* | [SP:IS25 Web Content Characterisation]\\
[SP:IS41 Analyse huge text files containing information about a web archive] |