Comparison of Platform Symphony and Apache Hadoop Using Berkeley SWIM

IBM asked STAC® to compare IBM Platform Symphony Advanced Edition to Apache Hadoop in the same hardware environment, using an off-the-shelf workload written to the Hadoop MapReduce API. This report documents and analyzes the test results and describes the system configuration and test methodology in detail.

For this project, we used the Statistical Workload Injector for MapReduce (SWIM) developed by the University of California at Berkeley. SWIM provides a large set of diverse MapReduce jobs based on production Hadoop traces obtained from Facebook, along with information to enable characterization of each job.

The hardware environment in the testbed consisted of 17 compute servers and 1 master server communicating over gigabit Ethernet. We compared Hadoop ver 1.0.1 to Symphony ver 5.2. Both systems used default configurations except where noted.

Big data has become a big topic at STAC. All it takes to confirm this is a quick glance at discussions and presentations at STAC Summits over the past few years (particularly NY and London).