Skip to end of metadata
Go to start of metadata
Govdocs1 Open Corpus
Description A corpus of 1 million documents that are freely available for research, drawn from US government web sites, of various formats.
Licensing None. Free to used and distribute.
Owner N/A
Dataset Location

Collection expert N/A
Issues brainstorm Should act as a representative corpus for web archive testing.
List of Issues A list of links to detailed Issue pages relevant to this Dataset

Enter labels to add to this page:
Please wait 
Looking for a label? Just start typing.