Title |
French Web Archives |
Description |
Several hundreds Tb of data representing 15 years of harvesting the French web. The harvesting was outsourced at first, and is now done inhouse. It is a mix of large scale harvests of the .fr domain and in-depth harvests of a curated selection of web sites. The data is encapsulated in ARC files (http://www.archive.org/web/researcher/ArcFileFormat.php ) |
Licensing |
This data is obtained by the French National Library (BnF) through legal deposit and can only be accessed in the library premises. |
Owner |
French National Library (Bibliothèque nationale de France) |
Dataset Location |
N/A |
Collection Champion |
Louise Fauduet (louisedotfauduetatbnfdotfr) |
Issues brainstorm |
MIME types declared by the web servers at the time of harvest do not always match results from the FILE utility when it it run on the archives. |
List of Issues |
Identifying web content |