View Source

h2.


h2. Evaluator(s)

Radu Pop, Internet Memory



h2. Evaluation


The IMF takes into account the quality of archived web sites. The quality is assured by a visual inspection: comparing the site in Internet with the archived site in IMF servers.
In order to improve that process, IMF is trying to develop an application, using the Markalizer developed UPMC, which compares two images. These two images are produced by Selenium based framework (V.2.24.1) by taking two snapshots: ideally, one is taken from the archive access and the second from the live.

This evaluation uses screenshots taken from the IMF Web Archive at two different dates in time.
Note also that for this specific test, only one node of the platform was used.
Workflow:
1° Loading a pair of Web Archive pages (2 urls given)
2° Take screenshots (Selenium)
3° Visual comparison of screenshots (Markalizer)
4° Produce the output result file (score of comparison)

*Goal / Sub-goal:*
*          Performance efficiency / Throughput*
* Loading webpages can take time and depends on different factors such as the complexity of the page, the Internet connection, the browser and browser version used and/or the status of remote servers.
* Taking the screenshot using Selenium Compare with Markalizer Overhead (preparation of next comparison)

*Reliability / Stability Indicators*
The external tools needed are :
* Selenium Firefox (for this evaluation)
* Xvfb (A graphical server, needed to run Firefox in virtual screen)
* Markalizer
The application is developed in Python
All needed components are installed separately (dependencies of packages)

*Reliability / Runtime stability*

The result has been measured as a float number that can measure and detect the differences between two images
| *Evaluation-Date* | *DD/MM/YY* | *01/11/2012* |
| *Platform-ID* | string | [Platform IMF |SP:Platform IMF 1]\\ |
| *Dataset(s)* | string | Pairs of urls from [IMF web archive|http://wiki.opf-labs.org/display/SP/Internet+Memory+Web+Archive]\\ |
| *Workflow method* | string | Python application wrapping and managing Selenium and the Markalizer tool |
| *Workflow(s) involved* | URL(s) | |
| *Tool(s) involved* | URL(s) | |
| *Link(s) to Scenario(s)* | URL(s) | [WCT1|http://wiki.opf-labs.org/display/SP/WCT1+Comparison+of+Web+Archive+pages]\\ |
* *

* *

*Platform IMF 1*
| *Field* | *Data type* | *Value* |
| *Platform-ID* | String | IMF Cluster |
| *Platform description* | String | Cloudera CDH3u2. \\
3 dual-core low consumption nodes |
| *Number of nodes* | integer | 3 |
| *Total number of physical CPUs* | integer | 3 |
| *CPU specs* | string | Dual core AMD G-T56N on 1600MHz |
| *Total number of CPU-cores* | integer | 6 Cores (3 * 2 Cores) |
| *Total amount of RAM in Gbytes* | integer | 24GB (3 * 8GB) |
| *average CPU-cores for nodes* | integer | 2 |
| *average RAM in Gbytes for nodes* | integer | 8 |
| *Operating System on nodes* | String | Debian 6 squeeze (64bit) |
| *Storage system/layer* | String | HDFS |
| *Network layer between nodes* | String | Local copy between two nodes : 80 MB/s 640 Mbps |
*Evaluation points*
| *Metric* | *Baseline definition* | *Baseline value* | *Goal* | *Evaluation 1 (01/11/2012)* |
| *NumberOfObjectsPerHour* | Number of comparisons made per hour | 0 | 100 | 38 |
| *NumberOfFailedFiles* | Number of images screenshots that failed in the workflow | 0 | 0 | 0 |


h2.