View Source

| *Title* \\ | French Web Archives |
| *Description* | Several hundreds Tb of data representing 15 years of harvesting the French web. The harvesting was outsourced at first, and is now done inhouse. It is a mix of large scale harvests of the .fr domain and in-depth harvests of a curated selection of web sites. The data is encapsulated in ARC files ([http://www.archive.org/web/researcher/ArcFileFormat.php|http://www.archive.org/web/researcher/ArcFileFormat.php]) |
| *Licensing* | This data is obtained by the French National Library (BnF) through legal deposit and can only be accessed in the library premises. |
| *Owner* | French National Library (Bibliothèque nationale de France) |
| *Dataset Location* | N/A |
| *Collection Champion* | Louise Fauduet (louisedotfauduetatbnfdotfr) |
| *Issues brainstorm* | MIME types declared by the web servers at the time of harvest do not always match results from the FILE utility when it it run on the archives. |
| *List of Issues* | [Identifying web content]\\ |