View Source

| *Title* \\ | *Govdocs1 Open Corpus* \\ |
| *Description* | A corpus of 1 million documents that are freely available for research, drawn from US government web sites, of various formats. \\ |
| *Licensing* | None. Free to used and distribute. \\ |
| *Owner* | N/A \\ |
| *Dataset Location* | [http://digitalcorpora.org/corpora/files] \\ |
| *Collection expert* | N/A |
\* This dataset contains 231,683 PDFs which total 127.8GB