French Web Archives

Skip to end of metadata
Go to start of metadata
French Web Archives
Description Several hundreds Tb of data representing 15 years of harvesting the French web. The harvesting was outsourced at first, and is now done inhouse. It is a mix of large scale harvests of the .fr domain and in-depth harvests of a curated selection of web sites. The data is encapsulated in ARC files (
Licensing This data is obtained by the French National Library (BnF) through legal deposit and can only be accessed in the library premises.
Owner French National Library (Bibliothèque nationale de France)
Dataset Location N/A
Collection Champion Louise Fauduet (louisedotfauduetatbnfdotfr)
Issues brainstorm MIME types declared by the web servers at the time of harvest do not always match results from the FILE utility when it it run on the archives.
List of Issues Identifying web content
dataset dataset Delete
web web Delete
mixed mixed Delete
Enter labels to add to this page:
Please wait 
Looking for a label? Just start typing.