Skip to end of metadata
Go to start of metadata


Radu Pop, Internet Memory


The IMF takes into account the quality of archived web sites. The quality is assured by a visual inspection: comparing the site in Internet with the archived site in IMF servers.
In order to improve that process, IMF is trying to develop an application, using the Markalizer developed UPMC, which compares two images. These two images are produced by Selenium based framework (V.2.24.1) by taking two snapshots: ideally, one is taken from the archive access and the second from the live.

This evaluation uses screenshots taken from the IMF Web Archive at two different dates in time.
Note also that for this specific test, only one node of the platform was used.
1° Loading a pair of Web Archive pages (2 urls given)
2° Take screenshots (Selenium)
3° Visual comparison of screenshots (Markalizer)
4° Produce the output result file (score of comparison)

Goal / Sub-goal:
          Performance efficiency / Throughput

  • Loading webpages can take time and depends on different factors such as the complexity of the page, the Internet connection, the browser and browser version used and/or the status of remote servers.
  • Taking the screenshot using Selenium Compare with Markalizer Overhead (preparation of next comparison)

Reliability / Stability Indicators
The external tools needed are :

  • Selenium Firefox (for this evaluation)
  • Xvfb (A graphical server, needed to run Firefox in virtual screen)
  • Markalizer
    The application is developed in Python
    All needed components are installed separately (dependencies of packages)

Reliability / Runtime stability

The result has been measured as a float number that can measure and detect the differences between two images

Evaluation-Date DD/MM/YY 01/11/2012
Platform-ID string Platform IMF
Dataset(s) string Pairs of urls from IMF web archive
Workflow method string Python application wrapping and managing Selenium and the Markalizer tool
Workflow(s) involved URL(s)  
Tool(s) involved URL(s)  
Link(s) to Scenario(s) URL(s) WCT1



Platform IMF 1

Field Data type Value
Platform-ID String IMF Cluster
Platform description String Cloudera CDH3u2.
3 dual-core low consumption nodes
Number of nodes integer 3
Total number of physical CPUs integer 3
CPU specs string Dual core AMD G-T56N on 1600MHz
Total number of CPU-cores integer 6 Cores (3 * 2 Cores)
Total amount of RAM in Gbytes integer 24GB (3 * 8GB)
average CPU-cores for nodes integer 2
average RAM in Gbytes for nodes integer 8
Operating System on nodes String Debian 6 squeeze (64bit)
Storage system/layer String HDFS
Network layer between nodes String Local copy between two nodes : 80 MB/s 640 Mbps

Evaluation points

Metric Baseline definition Baseline value Goal Evaluation 1 (01/11/2012)
NumberOfObjectsPerHour Number of comparisons made per hour 0 100 38
NumberOfFailedFiles Number of images screenshots that failed in the workflow 0 0 0

Enter labels to add to this page:
Please wait 
Looking for a label? Just start typing.