Radu Pop, Internet Memory
The IMF takes into account the quality of archived web sites. The quality is assured by a visual inspection: comparing the site in Internet with the archived site in IMF servers.
In order to improve that process, IMF is trying to develop an application, using the Markalizer developed UPMC, which compares two images. These two images are produced by Selenium based framework (V.2.24.1) by taking two snapshots: ideally, one is taken from the archive access and the second from the live.
This evaluation uses screenshots taken from the IMF Web Archive at two different dates in time.
Note also that for this specific test, only one node of the platform was used.
1° Loading a pair of Web Archive pages (2 urls given)
2° Take screenshots (Selenium)
3° Visual comparison of screenshots (Markalizer)
4° Produce the output result file (score of comparison)
Goal / Sub-goal:
Performance efficiency / Throughput
- Loading webpages can take time and depends on different factors such as the complexity of the page, the Internet connection, the browser and browser version used and/or the status of remote servers.
- Taking the screenshot using Selenium Compare with Markalizer Overhead (preparation of next comparison)
Reliability / Stability Indicators
The external tools needed are :
- Selenium Firefox (for this evaluation)
- Xvfb (A graphical server, needed to run Firefox in virtual screen)
The application is developed in Python
All needed components are installed separately (dependencies of packages)
Reliability / Runtime stability
The result has been measured as a float number that can measure and detect the differences between two images
|Platform-ID||string|| Platform IMF
|Dataset(s)||string|| Pairs of urls from IMF web archive
|Workflow method||string||Python application wrapping and managing Selenium and the Markalizer tool|
|Link(s) to Scenario(s)||URL(s)|| WCT1
Platform IMF 1
|Platform description||String|| Cloudera CDH3u2.
3 dual-core low consumption nodes
|Number of nodes||integer||3|
|Total number of physical CPUs||integer||3|
|CPU specs||string||Dual core AMD G-T56N on 1600MHz|
|Total number of CPU-cores||integer||6 Cores (3 * 2 Cores)|
|Total amount of RAM in Gbytes||integer||24GB (3 * 8GB)|
|average CPU-cores for nodes||integer||2|
|average RAM in Gbytes for nodes||integer||8|
|Operating System on nodes||String||Debian 6 squeeze (64bit)|
|Network layer between nodes||String||Local copy between two nodes : 80 MB/s 640 Mbps|
|Metric||Baseline definition||Baseline value||Goal||Evaluation 1 (01/11/2012)|
|NumberOfObjectsPerHour||Number of comparisons made per hour||0||100||38|
|NumberOfFailedFiles||Number of images screenshots that failed in the workflow||0||0||0|