MPI Benchmarks: Amazon EC2 Cluster Compute vs. Myrinet 10Gig

The data and opinions expressed here are solely my own and do not reflect those of my employer or the providers of the computing resources used.

As a followup to my previous post on how to set up MPI across Amazon's Cluster Compute nodes, here are the results I got from OSU's microbenchmarks.  I ran each point-to-point test five times, and the results shown are the average of those five trials.  For the sake of comparison, I also ran the same benchmarks on a Myrinet 10gig cluster.


The numbers speak for themselves; EC2's interconnect performance is not great, and the disparity only worsens when comparing EC2 to Infiniband (e.g., see Adam DeConinck's blog which compared the last generation of Cluster Compute instances to QDR Infiniband).

I also ran a specific quantum chemistry application across four EC2 CC instances to benchmark real-world performance.  Using identical processors (two Intel Xeon E5-2670) and with 60 GB of RAM, the performance on EC2 was about 30% worse than our reference.

If anyone is interested, I have the results from running almost the entire suite of OSU benchmarks on an EC2 CC cluster.  The above data only represents the point-to-point communications, but the one-way and collective communications really weren't any better.

Update: I had so much fun running these numbers that I decided to also see how the Blue Gene/P torus interconnect stacked up.  Bear in mind that Blue Gene/P came out way back in 2007 and is over five years old now; I think it's fair to say that it predates the widespread adoption of QDR Infiniband and Myrinet 10G.

Bandwidth Comparison
The hiccups in the otherwise-monotically increasing bandwidth are due to each MPI stack's tuning parameters.  For example, the maximum message size before switching from an eager (asynchronous) to a rendezvous (synchronous) protocol were 4K for Myrinet-10G, 64K for TCP (Amazon's 10Gb), and 1200 bytes for Blue Gene/P.
Bidiretional Bandwidth Comparison

Latency Comparison