Page: Database containing a unique list of Danish words (Practical Preservation Issues)
Title \\ Database containing a unique list of Danish words Description The dataset is based on a web scrape of selected Danish websites, extraction of words from the webpage <body></body> and insertion of unique words into mySQL databqase. The words are enriched with information on the surrounding ...
Other labels: dataset, opf, opf_copenhagen, document
Page: French Web Archives (Practical Preservation Issues)
Title \\ French Web Archives Description Several hundreds Tb of data representing 15 years of harvesting the French web. The harvesting was outsourced at first, and is now done&nbsp;inhouse. It is a mix of large scale harvests of the .fr domain and indepth harvests of a curated ...
Other labels: dataset, mixed
Page: Ida Roper Herbarium archive (Practical Preservation Issues)
Title \\ Ida Roper Herbarium archive \\ Description The Roper archive consists of approximately 10,000 specimens of English plants. The digital archive donated to Leeds represents the outputs of a Arts and Humanitites Research Board project from 2003, to improve access to the Ida ...
Other labels: dataset, mixed, document, image, database
Page: Internet Memory Web Archive (SCAPE)
Title \\ Internet Memory Web collections \\ Description The data consists in web content crawled, stored and hosted by the Internet Memory Foundation (W)ARC format (approx. 300TB) \\ Using this content, IM can also use its taskforce (QA ...
Other labels: dataset
Page: Malta Music Memory Project (M3P) (SPRUCE)
Title \\ Malta Music Memory Project (M3P) \\ Description The main goal of the M3P is to provide an inclusive repository for memories of Malta's music and associated arts, ensuring that these are kept in posterity for current and future generations. Licensing For IPR issues see http ...
Other labels: dataset, spruce, spruce_glasgow