| *Title* \\ | IS39 Format obsolescence detection |
| *Detailed description* | Format obsolescence detection within large amounts of Web data |
| *Scalability Challenge* \\ | _Current size of our archive is around 200TB and is growing rapidly._ |
| *[Issue champion|SP:Responsibilities of the roles described on these pages]* | [Leïla Medjkoune|https://portal.ait.ac.at/sites/Scape/Management/_layouts/userdisp.aspx?ID=69&Source=https%3A%2F%2Fportal.ait.ac.at%2Fsites%2FScape%2FManagement%2F_layouts%2Fpeople.aspx%3FMembershipGroupId%3D5] (IM) \\ |
| *Other interested parties* \\ | |
| *Possible Solution approaches* | _Use of image comparison to detect rendering errors within web archive: compare reference snapshots of web pages in different browser versions (WP11-WP12)_ \\
Use of pattern recognition to establish a complex signature for an HTML page (WP9) \\ |
| *Context* | \\ |
| *Lessons Learned* | \\ |
| *Training Needs* | \\ |
| *Datasets* | [Subset of IM Web Archive |http://wiki.opf-labs.org/display/SP/Internet+Memory+Web+Archive]\\ |
| *Solutions* | |
h1. Evaluation
| *Objectives* | _Which scape objectives does this issues and a future solution relate to? e.g. scaleability, rubustness, reliability, coverage, preciseness, automation_ |
| *Success criteria* | _Describe the success criteria for solving this issue - what are you able to do? - what does the world look like?_ |
| *Automatic measures* | _What automated measures would you like the solution to give to evaluate the solution for this specific issue? which measures are important?_ \\
_If possible specify very specific measures and your goal - e.g._ \\
_ \* process 50 documents per second_ \\
_ \* handle 80Gb files without crashing_ \\
_ \* identify 99.5% of the content correctly_ \\ |
| *Manual assessment* | _Apart from automated measures that you would like to get do you foresee any necessary manual assessment to evaluate the solution of this issue?_ \\
_If possible specify measures and your goal - e.g._ \\
_ \* Solution installable with basic linux system administration skills_ \\
_ \* User interface understandable by non developer curators_ \\ |
| *Actual evaluations* | links to acutual evaluations of this Issue/Scenario |
| *Detailed description* | Format obsolescence detection within large amounts of Web data |
| *Scalability Challenge* \\ | _Current size of our archive is around 200TB and is growing rapidly._ |
| *[Issue champion|SP:Responsibilities of the roles described on these pages]* | [Leïla Medjkoune|https://portal.ait.ac.at/sites/Scape/Management/_layouts/userdisp.aspx?ID=69&Source=https%3A%2F%2Fportal.ait.ac.at%2Fsites%2FScape%2FManagement%2F_layouts%2Fpeople.aspx%3FMembershipGroupId%3D5] (IM) \\ |
| *Other interested parties* \\ | |
| *Possible Solution approaches* | _Use of image comparison to detect rendering errors within web archive: compare reference snapshots of web pages in different browser versions (WP11-WP12)_ \\
Use of pattern recognition to establish a complex signature for an HTML page (WP9) \\ |
| *Context* | \\ |
| *Lessons Learned* | \\ |
| *Training Needs* | \\ |
| *Datasets* | [Subset of IM Web Archive |http://wiki.opf-labs.org/display/SP/Internet+Memory+Web+Archive]\\ |
| *Solutions* | |
h1. Evaluation
| *Objectives* | _Which scape objectives does this issues and a future solution relate to? e.g. scaleability, rubustness, reliability, coverage, preciseness, automation_ |
| *Success criteria* | _Describe the success criteria for solving this issue - what are you able to do? - what does the world look like?_ |
| *Automatic measures* | _What automated measures would you like the solution to give to evaluate the solution for this specific issue? which measures are important?_ \\
_If possible specify very specific measures and your goal - e.g._ \\
_ \* process 50 documents per second_ \\
_ \* handle 80Gb files without crashing_ \\
_ \* identify 99.5% of the content correctly_ \\ |
| *Manual assessment* | _Apart from automated measures that you would like to get do you foresee any necessary manual assessment to evaluate the solution of this issue?_ \\
_If possible specify measures and your goal - e.g._ \\
_ \* Solution installable with basic linux system administration skills_ \\
_ \* User interface understandable by non developer curators_ \\ |
| *Actual evaluations* | links to acutual evaluations of this Issue/Scenario |