Skip to end of metadata
Go to start of metadata
Internet Memory Web collections
Description The data consists in web content crawled, stored and hosted by the Internet Memory Foundation (W)ARC format (approx. 300TB)
Using this content, IM can also use its taskforce (QA team) to provide annotated data such as pairs of annotated snapshots for quality assurance scenarios.
1000 annotated paires of web pages (similar/dissimilar) were produced as part of PC.WP3: Quality Assurance Components.
Licensing Web collections crawled on behalf of partner institutions will require institutions agreement to be used by SCAPE partners
Owner Internet Memory
Dataset Location Provided upon request
Collection expert Leïla Medjkoune (IM)
Issues brainstorm A bulleted list of possible preservation or business driven Issues. This is useful for describing ideas that might be turned into detailed Issues at a later date
List of Issues A list of links to detailed Issue pages relevant to this Dataset

dataset dataset Delete
web web Delete
Enter labels to add to this page:
Please wait 
Looking for a label? Just start typing.