This page is to be used for collecting information about benchmarks from SCAPE partners' Hadoop installations. The hope is that it will enable comparison of installations.
Details are in this blog post: http://openplanetsfoundation.org/blogs/2013-09-30-let%E2%80%99s-benchmark-our-hadoop-clusters-join
Aggregate results
Ratios of throughput (bytes/sec) | |||||
Type | BL | SB 1 | SB 2 | Limiting capabilities in test | |
NUTCHINDEX | 1.0 | 3.2 | 1.9 | Balanced | |
WORDCOUNT | 1.0 | 3.4 | 4.2 | CPU-bound | |
DFSIOE-READ | 1.0 | 2.5 | 1.9 | IO-bound | |
DFSIOE-WRITE | 1.0 | 3.4 | 1.4 | IO-bound | |
HIVEAGGR | 1.0 | 0.0 | 39.4 | ? | |
HIVEJOIN | 1.0 | 0.0 | 68.6 | ? | |
KMEANS | 1.0 | 0.0 | 2.1 | Map: CPU-bound, Reduce: IO-bound | |
PAGERANK | 1.0 | 3.1 | 1.2 | ? Network bound? | |
BAYES | 1.0 | 1.8 | 1.7 | Balanced | |
SORT | 1.0 | 2.9 | 0.9 | IO-bound | |
TERASORT | 1.0 | 2.2 | 1.2 | RAM-bound, Map: CPU-bound, Reduce: IO-bound | |
Categories from: | https://github.com/intel-hadoop/HiBench/raw/master/WISS10_conf_full_011.pdf![]() |
||||
Approximate ratios, per workload type (by manual estimation) |
|||||
BL | SB 1 | SB 2 | |||
Balanced workload (IO/RAM/CPU) | 1 | 2.5 | 1.8 | ||
IO-bound workload | 1 | 3 | 1.7 | ||
CPU-bound workload | 1 | 3.4 | 4.2 |
British Library
Cluster:
Results for British library Digital Preservation Hadoop cluster:
(1 JobTracker/NameNode, 28 TaskTracker/DataNodes, 6GB RAM/1 CPU/500GB HDD per node)
Results:
Type | Date | Time | Input_data_size | Duration(s) | Throughput(bytes/s) | Throughput/node |
NUTCHINDEX | 18/09/2013 | 14:11:02 | 586453704 | 241.8 | 2425366 | 86620 |
WORDCOUNT | 18/09/2013 | 11:30:51 | 89600175810 | 1292.92 | 69300634 | 2475022 |
DFSIOE-READ | 18/09/2013 | 11:51:54 | 54005188956 | 368.998 | 146356318 | 5227011 |
DFSIOE-WRITE | 18/09/2013 | 12:00:10 | 27323986498 | 488.494 | 55935152 | 1997684 |
HIVEAGGR | 18/09/2013 | 12:12:02 | 17713294775 | 226.04 | 78363540 | 2798697 |
HIVEJOIN | 18/09/2013 | 12:17:34 | 18540722846 | 305.227 | 60744045 | 2169430 |
KMEANS | 19/09/2013 | 09:58:57 | 504003386 | 334.933 | 1504788 | 53742 |
PAGERANK | 18/09/2013 | 12:23:11 | 398276167 | 186.61 | 2134270 | 76223 |
BAYES | 18/09/2013 | 14:29:29 | 177879705 | 1004.847 | 177021 | 6322 |
SORT | 18/09/2013 | 12:47:02 | 67200178803 | 913.232 | 73585002 | 2628035 |
TERASORT | 18/09/2013 | 12:53:09 | 10000000000 | 201.578 | 49608588 | 1771735 |
Danish State and University Library
(1 JobTracker/NameNode, 3 TaskTracker/DataNodes, 96GB RAM / 2 CPU (6 cores - 12 threads) / ? HDD per node)
Results:
Type | Date | Time | Input_data_size | Duration(s) | Throughput(bytes/s) | Throughput/node |
NUTCHINDEX | 2013-11-26 | 09:28:55 | 594889272 | 77.383 | 7687596 | 2562532 |
WORDCOUNT | 2013-11-26 | 09:38:15 | 96000020276 | 410.418 | 233907918 | 77969306 |
DFSIOE-READ | 2013-11-26 | 09:44:04 | 54005177946 | 150.323 | 359260911 | 119753637 |
DFSIOE-WRITE | 2013-11-26 | 09:46:29 | 27319740657 | 142.999 | 191048473 | 63682824 |
HIVEAGGR | N/A (hive not installed) |
|||||
HIVEJOIN | N/A (hive not installed) | |||||
KMEANS | N/A (problem not identified) | |||||
PAGERANK | 2013-11-26 | 09:48:20 | 398276167 | 60.617 | 6570370 | 2190123 |
BAYES | 2013-11-26 | 09:58:18 | 180094898 | 574.317 | 313580 | 104526 |
SORT | 2013-11-26 | 10:05:53 | 72000018177 | 332.034 | 216845317 | 72281772 |
TERASORT | 2013-11-26 | 10:08:12 | 10000000000 | 89.762 | 111405717 | 37135239 |
Note: EMC Isilon setup is handling name node and data nodes
Type | Date | Time | Input_data_size | Duration(s) | Throughput(bytes/s) | Throughput/node |
NUTCHINDEX | 2014-01-24 | 08:45:46 | 594890284 | 130.206 | 4568839 | 1142209 |
WORDCOUNT | 2014-01-24 | 08:58:56 | 128000028859 | 442.608 | 289195018 | 72298754 |
DFSIOE-READ | 2014-01-24 | 09:11:23 | 54005212089 | 193.870 | 278564048 | 69641012 |
DFSIOE-WRITE | 2014-01-24 | 09:17:07 | 27322382558 | 341.443 | 80020332 | 20005083 |
HIVEAGGR | 2014-01-24 | 09:26:13 | 17590025456 | 5.691 | 3090849667 | 772712416 |
HIVEJOIN | 2014-01-24 | 09:26:27 | 18417453527 | 4.420 | 4166844689 | 1041711172 |
KMEANS | 2014-01-24 | 09:30:09 | 504003386 | 159.311 | 3163644 | 790911 |
PAGERANK | 2014-01-24 | 09:33:25 | 398276167 | 152.253 | 2615883 | 653970 |
BAYES | 2014-01-24 | 09:43:39 | 180094898 | 589.064 | 305730 | 76432 |
SORT | 2014-01-24 | 10:13:31 | 96000029057 | 1519.260 | 63188676 | 15797169 |
TERASORT | 2014-01-24 | 10:18:33 | 10000000000 | 168.287 | 59422296 | 14855574 |
Note: Each node has a network storage (nfs mount) as their 'local' storage