Friday, October 18, 2013

Track 2 and Comet/Wrangler

Just before our federal government shut down, the National Science Foundation announced that SDSC would be awarded a $12 million piece of its most recent ACI Track 2 solicitation, Track 2F.  The Track 2 program, initially designed to award $30 million (on a competitive basis) every year to deploy a new supercomputer into XSEDE, has been what's funded the majority of the compute cycles available via XSEDE.  Here's a brief summary:
  1. Track 2A funded Ranger (OCI-0622780), a 479-teraflop Sun Constellation cluster at TACC
  2. Track 2B funded Kraken (OCI-0711134), a Cray XT machine that evolved into a petaflop machine at NICS
  3. Track 2C was going to fund a 197-rack SGI Altix UV at PSC, but SGI went bankrupt
  4. Track 2D funded three separate projects of an experimental design:
    1. Gordon, a data-intensive supercomputer at SDSC (OCI-0910847)
    2. Keeneland, a GPU-rich supercomputer at NICS (but led by Georgia Tech; OCI-0910735)
    3. FutureGrid, a physically distributed cloud testbed led by Indiana and housed all over the US (OCI-0910812)
  5. Track 2E funded Stampede, a 2.2-petaflop machine with an additional 7.4 petaflops of Intel Xeon Phi coprocessors (OCI-1134872)
  6. Track 2F will be funding
    1. Comet at SDSC (initially named Wildfire; ACI-1341698), and
    2. Wrangler at TACC (ACI-1341711)
This history of Track 2 should illustrate how much of a big deal these awards are, and it's pretty exciting to be involved in the process of proposing, winning, and deploying a machine of this scale.


Unfortunately, the details on these two Track 2F systems are sparse, and deliberately so.  Both systems will be deployed in late 2014 and put into production by 2015, meaning the hardware that was proposed and awarded hasn't even hit the market yet and probably remains, quite literally, an industry secret.

Both the SDSC press release and the NSF award abstract for Comet emphasize that Comet will be more a capacity machine than a capability machine, and it will serve the needs of the "98%" of XSEDE users who aren't running massively parallel jobs.  While this figure may sound surprising, it's really true--verify yourself by logging into XDMoD and following this procedure:
  1. click the usage tab
  2. collapse the "Jobs Summary" option on the left, then expand the "Jobs by Job Size" option
  3. click the "number of jobs" option
  4. along the top (under the row of tabs), change "Duration:" to "previous year"
In some sense, the overall mission of this machine will be a lot like SDSC's Trestles resource which is specially tuned for fast turnaround times and small jobs.  The big difference is that Comet's acquisition cost is 4x larger than Trestles was, and its target peak FLOPS are 18x-20x higher.

This isn't to say that Comet will be a completely vanilla system like Trestles tends to be though; in the press release, SDSC's director made the bold statement that Comet will be "the world’s first virtualized HPC cluster."  This admittedly sounds a bit hokey since I've detailed why cloud computing is no good for HPC, but Comet will actually be quite different in that its use of SR-IOV will allow virtual sub-clusters to run applications over Infiniband at near-native speeds.  A few colleagues and I did a good amount of application testing and benchmarking to make sure that the approach was viable, and I'm excited to see this stuff make it into production.

Speaking of sub-clusters, the press release also says that each Comet rack will have 72 nodes connected at full bisection, FDR bandwidth.  The inter-rack links will be oversubscribed (4:1) like Gordon's inter-switch torus links (16:3), suggesting that 72 nodes will be a magical number in terms of maximum optimal job size akin to Gordon's 16-node magic size.  How these architectural details translate into policies remains to be announced by SDSC, and hopefully a lot more juicy details about Comet will be released next month at SC.  Last I heard, Comet will be officially announced at SDSC's booth on Tuesday afternoon.


HPCwire ran a very enlightening article on Wrangler based on what I assume was an interview with co-PI Chris Jordan, and in doing so, spilled some beans about Intel's upcoming Haswell-based Xeon processors.  I am hesitant to comment further since I know that numbers like the cores-per-processor mentioned in the HPCwire article haven't even been announced by Intel yet and thus may be a non-disclosure minefield, but the HPCwire article and the NSF award abstract do highlight a few interesting points about Wrangler:
  • it will have "3000 next generation" Intel cores
  • it will be a "Dell-supplied 120 node cluster"
  • the award payout was only $5 mil instead of the $6 mil put forth in the Track 2F solicitation (although this may have to do with the end of FY'13, the government shutdown, etc)
The cost-per-node works out to be ludicrously high; by comparison, SDSC's Trestles system has 324 nodes each with 32 cores, 64 GB RAM, and an SSD for a measly $3 million.  The fact that twice the hardware budget is buying a third of the nodes suggests that Wrangler is going to have some fancy (read: expensive) magic under the hood.  The Track 2 awards are historically very cost-sensible though, so it's highly unlikely that the NSF got taken for a ride by awarding this proposal.  DSSD, a company founded by Andy Bechtolsheim (who also founded Sun Microsystems, Arista, and had a hand in developing Sun's famed Thumper), is a key strategic partner in Wrangler too, so there might be some really revolutionary ideas arising from this award.

Again, I look forward to hearing more details in Denver next month.