View Source

| *Title* | {color:#000000}Comparing two web page versions for web archiving{color} |
| *Detailed description* | {color:#000000}Our system is based on: (1) a combination of structural and visual comparison methods embedded in a statistical discriminative model, (2) a visual similarity measure designed for Web pages that improves change detection, (3) a supervised feature selection method adapted to Web archiving. We train a Support Vector Machine model with vectors of similarity scores between successive versions of pages. The trained model then determines whether two versions, defined by their vector of similarity scores, are similar or not. Experiments on real Web archives validate our approach.{color} |
| *[Solution Champion|SP:Responsibilities of the roles described on these pages]* \\ | [Sureda-Gutierrez Carlos|https://portal.ait.ac.at/sites/Scape/_layouts/userdisp.aspx?ID=144] (UPMC). |
| *Corresponding Issue(s)* \\ | [IS28 Structural and visual comparisons for web page archiving|SP:IS28 Structural and visual comparisons for web page archiving]\\
[SP:IS7 Incompleteness and and inconsistency of web archive data]\\
[SP:IS19 Migrate whole archive to new archiving system]\\ |
| *myExperiment Link* \\ | [MarcAlizer|http://www.myexperiment.org/workflows/2810.html]\\ |
| *Tool Registry Link* \\ | [TR:Pagelyzer]\\ |
| *Evaluation* \\ | TBD \\ |