compared with
Key
This line was removed.
This word was removed. This word was added.
This line was added.

Changes (26)

View Page History
| *Evaluator-ID* | email | [email protected] |
| *Evaluation description* | text | The IMF takes into account the quality of archived web sites. The quality is assured by a visual inspection: comparing the site in Internet with the archived site in IMF servers. \\
In order to improve that process, IMF is trying to develop an application, using the Pagelyzer developed UPMC, which compares two images. These two images are produced by Selenium based framework (V.2.24.1) by taking two snapshots: ideally, one is taken from the archive access and the second from the live. \\
\\
\\
Workflow: \\
1° Load live page, take screen shot (Selenium + Firefox headless) \\
2° Load web page from archive, take screen shot(Selenium + Firefox headless) \\
3° Visual comparison of screenshots (Pagelyzer) \\
4° Produce the output result file (score of comparison) \\
\\
\\
*Goal / Sub-goal:* \\
*          Performance efficiency / Throughput*
* Loading webpages can take time and depends on different factors such as the complexity of the page, the Internet connection, the browser and browser version used and/or the status of remote servers.
* Taking the screenshot using Selenium Compare with Pagelyzer  overhead (preparation of next comparison) \\
\\
\\
*Reliability / Stability Indicators* \\
The external tools needed are :
* Selenium Firefox (for this evaluation)
* Xvfb (A graphical server, needed to run Firefox in virtual screen)
* Pagelyzer \\
The application is developed in Java/Ruby \\
All needed components are installed separately (dependencies of packages) \\
\\
\\
*Reliability / Runtime stability*
* The result has been measured as a float number that can measure and detect the differences between two images |
| *Evaluation-Date* | DD/MM/YY | 01/09/2014 |
| *Platform-ID* | string | [Platform IMF 2|../../../../../../../../../../display/SP/Platform+IMF+2] |
| *Dataset(s)* | string | Urls from [IMF web archive|../../../../../../../../../../display/SP/Internet+Memory+Web+Archive] |
| *Platform-ID* | string | |
| *Dataset(s)* | string | Sample of 2.6 millions URLs from |
| *Workflow method* | string | MapReduce job using selenium and Pagelyzer internally \\ |
| *Workflow(s) involved* | URL(s) | \\ |
| *Tool(s) involved* | URL(s) | \\ |
| *Link(s) to Scenario(s)* | URL(s) | [WCT1|../../../../../../../../../../display/SP/WCT1+Comparison+of+Web+Archive+pages] |
* *

| *Platform-ID* | String | IMF Cluster 2                                                               |
| *Platform description* | String | Cloudera CDH4.6 \\
43 nodes |
| *Number of nodes* | integer | 43 |
| *Total number of physical CPUs* | integer | 43 |
| *CPU specs* | string | 15 * Dual core AMD G-T56N on 1600MHz, \\
28 * Intel(R) Core(TM) i5-3470S CPU @ 2.90GHz          |
| *Total number of CPU-cores* | integer | 142 Cores (15 * 2 Cores + 28 * 4 Cores)      |