Title |
Internet Memory Web collections |
Description | The data consists in web content crawled, stored and hosted by the Internet Memory Foundation (W)ARC format (approx. 300TB) Using this content, IM can also use its taskforce (QA team) to provide annotated data such as pairs of annotated snapshots for quality assurance scenarios. 1000 annotated paires of web pages (similar/dissimilar) were produced as part of PC.WP3: Quality Assurance Components. |
Licensing | Web collections crawled on behalf of partner institutions will require institutions agreement to be used by SCAPE partners |
Owner | Internet Memory |
Dataset Location | Provided upon request |
Collection expert | Leïla Medjkoune![]() |
Issues brainstorm | A bulleted list of possible preservation or business driven Issues. This is useful for describing ideas that might be turned into detailed Issues at a later date |
List of Issues | A list of links to detailed Issue pages relevant to this Dataset |