|| Govdocs1 Open Corpus
|Description|| A corpus of 1 million documents that are freely available for research, drawn from US government web sites, of various formats.
|Licensing|| None. Free to used and distribute.
|Dataset Location|| http://digitalcorpora.org/corpora/files
* This dataset contains 231,683 PDFs which total 127.8GB