This is a brief outline of the web archiving team's Hadoop cluster, from which the BL SCAPE team have negotiated access and storage space so that it can act as our SCAPE testbed.
- Red Hat Enterprise Linux Server release 5.7 (Tikanga)
- Cloudera CDH3u1
- Hadoop 0.20.2
- Java 1.6.0_26
We have two racks, each containing 25 HP DL120 G6 servers. Each server (node) is:
CPU: Intel Xeon X3450 (2.66GHz/4-core/8MB/95W)
NIC: 2x HP NC107i PCI Express Gigabit Server Adapter
DISK: Upgraded to include 3 x 2TB disks per node.
Therefore, we have a total of 200 cores, with 800 GB of RAM and 300 TB of disk space for HDFS (i.e. 100 TB of triply-redundant storage).
Each rack contains a FASTIRON EDGE X448 (FESX448) switch and are connected via a 10Gb link.
This server acts as the user front-end from which Hadoop jobs are submitted and upon which any post-processing is performed.
CPU: 24 core
Separate servers for the Name Node and Task Tracker. Both are:
HP Proliant DL160 G6 Server Rack (1UI)
CPU: Intel Xeon (Quad-Core) (E5520) 2.26GHz
NIC: Embedded HP NC362i Dual Port 1 Gigabit