Comparison of IBM InfoSphere BigInsights Enterprise Edition with Apache Hadoop using SWIM
IBM asked STAC® to compare pure Apache Hadoop to BigInsights® with Adaptive MapReduce enabled, in the same hardware environment, using an off-the-shelf workload written to the Hadoop MapReduce API. This report documents and analyzes the test results and describes the system configuration and test methodology in detail.
For this project, we used the Statistical Workload Injector for MapReduce (SWIM) developed by the University of California at Berkeley. SWIM provides a large set of diverse MapReduce jobs based on production Hadoop traces obtained from Facebook, along with information to enable characterization of each job.
The hardware environment in the testbed consisted of 17 compute servers and 1 master server communicating over gigabit Ethernet. We compared Hadoop ver 1.1.2 to IBM BigInsights ver 18.104.22.168. Both systems used default configurations except where noted.
Stack under test:
- IBM InfoSphere BigInsights Enterprise Edition 22.214.171.124 or Apache Hadoop 1.1.2
- 18 x IBM System x3630 M3 servers
- Red Hat Enterprise Linux 6.4
- 12 x IBM 2TB SAS Hard Drives per server
- 2 x 6-core Intel(R) Xeon(R) E5645 @ 2.4GHz ("Westmere")
- Mellanox MT26448 ConnectX EN 10Gbps Adapters
- Juniper Networks QFX35