Skip to end of metadata
Go to start of metadata
IS39 Format obsolescence detection
Detailed description Format obsolescence detection within large amounts of Web data
Scalability Challenge
Current size of our archive is around 200TB and is growing rapidly.
Issue champion Leïla Medjkoune (IM)
Other interested parties
Possible Solution approaches Use of image comparison to detect rendering errors within web archive: compare reference snapshots of web pages in different browser versions (WP11-WP12)
Use of pattern recognition to establish a complex signature for an HTML page (WP9)
Lessons Learned
Training Needs
Datasets Subset of IM Web Archive


Objectives Which scape objectives does this issues and a future solution relate to? e.g. scaleability, rubustness, reliability, coverage, preciseness, automation
Success criteria Describe the success criteria for solving this issue - what are you able to do? - what does the world look like?
Automatic measures What automated measures would you like the solution to give to evaluate the solution for this specific issue? which measures are important?
If possible specify very specific measures and your goal - e.g.
 * process 50 documents per second
 * handle 80Gb files without crashing
 * identify 99.5% of the content correctly
Manual assessment Apart from automated measures that you would like to get do you foresee any necessary manual assessment to evaluate the solution of this issue?
If possible specify measures and your goal - e.g.
 * Solution installable with basic linux system administration skills
 * User interface understandable by non developer curators
Actual evaluations links to acutual evaluations of this Issue/Scenario
issue issue Delete
obsolescence obsolescence Delete
Enter labels to add to this page:
Please wait 
Looking for a label? Just start typing.