Tuesday, October 22, 2013

A supercomputer IS a computer

HPCwire recently ran a featured article in which the author states that, when selling the idea of high-performance computing (HPC) to scientists or technical people, the following distinction should be made "firmly":
"...a supercomputer is NOT a computer – it is a major scientific instrument that just happens to be built using computer technology."
He goes on to say that using supercomputers requires "expertise in the technology," and I gather that the point he's trying to make is that, even if you are capable of using Microsoft Excel on your desktop, you won't necessarily be qualified to use a supercomputer because they are highly specialized scientific instruments.

I found this perspective extremely unsettling, and although I'd normally just grumble about these sorts of posts under my breath and move on, the article is gaining a lot of publicity across social media outlets.  I felt compelled to write a rebuttal, even if nobody reads it, because the author's analogy reflects a viewpoint that is becoming antiquated amidst emerging trends in HPC, and repeating these sorts of statements threatens to stymy progress in the state of supercomputing.

This post of mine wound up a lot longer than I anticipated, so in the interests of providing an abbreviated version, here are my main objections to the HPCwire article's "supercomputers are NOT computers" statement:

  • Supercomputing has traditionally been exclusive because the underlying technology was vastly different from desktop computing.  This is no longer the case.
  • The field of scientific supercomputing is moving towards being more user-friendly and inclusive.  This is reflected in
    • Standard APIs and libraries that abstract the hardware differences away from application developers and users, and
    • this leads to dramatic increases in portability, to a point where many HPC-oriented applications will happily compile and run on your laptop.
    • Also, the federal government is putting stacks of money behind this idea of making it as easy as possible for new users to use supercomputers, providing substantial financial momentum for moving forward.
  • New interfaces and modalities are emerging where desktop applications can transparently interface with HPC
    • Desktop applications are beginning to adopt support for offloading to HPC on the backend
    • Web-based applications are coming into force which entirely mask the role of HPC in the process from input to output
    • The stage is being set for complete device abstraction from HPC at the confluence of web-based frontends and mobile devices

And the usual disclaimer:  This post contains my personal opinions and does not necessarily or authoritatively reflect the opinion of my employer blah blah blah.  Perhaps more importantly, nobody put me up to posting this.  I strongly believe in what I wrote here, and I wrote it because I genuinely feel that the HPC industry needs to move beyond the sentiments expressed by the author of the HPCwire article.  It just so happens that I think my employer agrees with me and is an active part of this evolution.

"A supercomputer is NOT a computer"

I get what the author was trying to say in terms of a supercomputer being similar to a piece of lab equipment.  I spent a little while working in experimental research before (and during) my entry into computational research, and I recall being equally overwhelmed by the expectation to immediately produce results using equipment (be it compute cluster or thermal evaporator) that I was not qualified to use.
I spent a few months working
with this plasma sprayer.  I'll
take Gordon over this thing any
day of the week.

Additional training and education is definitely necessary to get the most out of any research instrument, supercomputers included.  However, coming out and saying supercomputers are "NOT" computers throws up a rather large, all-capital-letters barrier that, when you think about it, serves to cast supercomputers in some light of exclusivity.  The way I see it, the corollaries of this barrier are
  1. There is a huge disconnect between desktop computing and supercomputing.  If you do science on the desktop, you will need to re-learn a lot to do science on a supercomputer.
  2. Thus, not every researcher can use supercomputers because using supercomputers requires specialized training and "expertise in the technology."  Since there are microscopists and there are spectroscopists, it should follow that there are "supercomputerists," and you would need a supercomputerist to supercompute.
All in all, if a supercomputer is NOT a computer, researchers face a long road ahead of them if they want to start applying HPC but have no background in it.  Right?

A supercomputer IS a computer

NO!  Supercomputers ARE computers, and there's not a lot that makes them holier than any other computer system.  To suggest that they are highly sophisticated research instruments that require extensive training and "expertise in the technology" to use is misguided.  You don't need to have a Ph.D. to use one, and in fact you don't even need a high school diploma to do meaningful work on one.

What gets my knickers in a twist about the "NOT a computer" statement is that it moves the public perception of high-performance computing in the wrong direction by separating "us" (the HPC technologists, or "supercomputerists") from "them" (the research community and the public at large).  The reality is that HPC is evolving in the opposite direction--more and more problems that were traditionally outside the computationally intensive domain of supercomputing are now becoming tractable due to an increasing emphasis on improving non-floating-point performance (e.g., data access and throughput for analytics, integer performance for graph problems).  At my workplace alone, we are giving supercomputer time to projects in non-traditional fields ranging from cinematic arts to mathematical anthropology to great effect.

In fact, getting a wider range of "them" to use "our" supercomputers has become such a widely recognized priority that the National Science Foundation has dedicated hundreds of millions of dollars to establishing a concerted, national effort to break down these barriers and provide an easy path for researchers to transition from their desktops and workstations to some of the fastest and most innovative supercomputers in the world.  The name of the game is to make HPC less intimidating and as similar as possible to what people already know how to do on their personal computers.

Perhaps this is getting a little off track from the literal implications of the statement that "supercomputers are NOT computers," so I think it's worthwhile to visit each of the two implications I mentioned above in more detail.

#1: The perceived disconnect between desktop computing and supercomputing

Saying that "supercomputers are NOT computers" implies that supercomputers are somehow vastly different from your laptop.  This is a falsehood on two levels: the hardware level and the software level.

Supercomputers use the same hardware

Cray 2 (an old supercomputer)
Supercomputers used to be big colorful SMP vector machines (such as the Cray 2 pictured), featuring custom processors (CPUs) and unique architectures.  It followed that custom compilers were needed to build applications for these special-purpose CPUs, and since the way those CPUs did meaningful work varied across supercomputers, applications had to be written in different ways to get the most performance out of those supercomputers.  This held true for decades, but I'd argue that NEC's Earth Simulator, dethroned from the #1 fastest supercomputer in the world by IBM's Blue Gene/L in 2004, was the last of these super-powerful, exotic, and exclusive vector machines (LANL's Roadrunner was a bit of an anomaly).

A more modern Cray XE6.
Pretty boring.
Nowadays the processors, memory, and overall architecture of each compute node in your typical high-performance computer is literally no different from what you can buy in a workstation; the key differentiator in supercomputers is the high-performance networking that ties all of these nodes together.

It follows that supercomputers can realize super performance when parallel applications, which can efficiently use multiple CPUs to tackle a single problem, are run across multiple CPU cores.  This sort of parallelism isn't unique to supercomputers anymore either, though.  Today's laptops and desktops (and even modern-day cell phones) all have multicore CPUs, and just like with supercomputers, they can run parallel scientific applications to great benefit.

Even some of the more exotic supercomputers that have accelerators, such as ORNL's Titan, aren't fundamentally different from your laptop.  Titan derives the majority of its peak performance from the NVIDIA K20 Kepler accelerator cards built into each node, but in reality these are just very beefy versions of the graphics cards that may have come with your laptop or desktop.  As such, a program designed to run on Titan's Kepler accelerators would also be able to run on your laptop's graphics chip.

Supercomputers use the same software

The vast majority of the applications running on today's supercomputers will also run on your personal laptop--including codes that rely on specialized hardware like NVIDIA GPUs and Intel MICs--because of open standards.  The truth to the statement that "supercomputers use the same software" as your laptop can be divided into three important layers:

A. The application level

The applications that people use on modern supercomputers will actually run just fine on a laptop or a desktop as well--just not as speedily.  In fact, many of the applications that users run our supercomputers have never evolved past the desktop; users run them on our big machines because they can run hundreds or thousands of calculations at once.  By comparison, their local desktop or computer lab would only be able to support a half a dozen calculations running concurrently.

Yet, to suggest that these cases are inappropriate use of HPC is total nonsense because these problems are often wholly intractable on non-HPC resources.  A really nice example of using desktop-class software for tremendous scientific gain on a supercomputer lies in a recent bioinformatics project I supported: a few researchers were given 450 human genomes (50 terabytes of compressed data) that needed to be processed as quickly as possible.  It turned out that the applications used to analyze these data were serial or multi-core at best, but they placed tremendous demand on storage capacity and I/O operation capability.  Without the large number of compute nodes and SSDs in our supercomputer, this project literally would have taken years to complete.  Our research partners were able to get it done in a month and a half.**

** This project was such a nice case of how supercomputers can be used in nontraditional ways that I'm presenting it at SC'13 this year in Denver.

B. The library/API level

Even with some of today's most exotic supercomputers such as IBM's Blue Gene/Q or Cray's XC30, porting applications from small-scale to these monstrous machines is made significantly easier because of the standardization of many high-performance libraries and application programming interfaces (APIs).  The underlying communication infrastructure (Blue Gene's five-dimensional torus, XC30's dragonfly topology and Aries ASIC, Infiniband, or whatever else) may be wildly different on different supercomputers, but they all provide MPI and OpenMP libraries that present a uniform programming interface to users and hide all of these hardware differences.

As a result, as long as an application is written using the correct MPI or OpenMP calls, it should compile and work on any computer (from laptop to supercomputer) that provides MPI and OpenMP libraries.  MPI and OpenMP is available for literally every vaguely modern computer I've encountered, and as a case in point, I've tested the same OpenMP-enabled parallel code I used for my dissertation on over fifty different computer systems with six different ISAs, dozens of processor generations, and a handful of different compilers, ranging from single-core laptops to Blue Gene/P.  Never once did I have to modify the source code to get the desired results, providing pretty compelling evidence that, indeed, open-standard libraries mean that there's little in the way of porting scientific applications from a laptop to a variety of exotic supercomputers.

C. The OS level

There is where a potentially big disconnect between personal computer and supercomputer can happen, as a lot of desktop computer users are only accustomed to Windows.  Mac users are in much better shape, as Mac OS X is actually UNIX under the hood and provides with the same software frameworks and standards compliance that all Linux-based (and UNIX-based) clusters and supercomputers use.  Compiling a scientific application to work on a MacBook is virtually identical to compiling an application to work on a supercomputer from the user perspective, and many scientific applications have precompiled binaries that work on Windows, Mac, and Linux-based supercomputers.

The biggest difference between desktop computing and HPC often lies in users having to interact with a terminal to manage their files and submit jobs to a supercomputer.  Anecdotally, I can say that most people who begin using the command line to do their scientific computing don't find this to be a huge hurdle though.  A wealth of information and tutorials (including a guide to command-line Linux for completely new users that I wrote) exists online, and of the dozen or so undergraduate researchers I've trained, all have gotten the hang of interacting with bash, using vi, and submitting jobs within two or three days.

#2: The need for "expertise in the technology"

I cannot argue that using supercomputers is completely intuitive and doesn't require some amount of training; despite the fact that the same applications run on laptops and supercomputers, getting into any of the nontrivial features of a given machine (designing workflows, optimizing algorithms, and even basic troubleshooting) requires an equally nontrivial understanding of the "super" aspect of the supercomputer.  To this end, a large part of my day job (and, as it turns out, my free time) is in developing training materials, giving talks and tutorials, and writing documentation.  Supercomputers can be obtuse, quirky, unintuitive, and aggravating.

However, I strongly disagree with the author's statement that HPC users need "expertise in the technology" to use supercomputers.  Both academic and industrial supercomputing are moving towards broadening inclusivity, and a tremendous amount of effort is going into literally making supercomputing completely transparent to the typical desktop computer user.  Here are a few examples of this.

Desktop applications capable of supercomputer offloading

There are a host of desktop applications, complete with user-friendly point-and-click graphical user interfaces, that can transparently integrate with supercomputers to offload calculations.  That is, all a user has to do is download and install these programs on his local desktop as they would any other program, follow a short setup tutorial on exactly how to hook into a supercomputer, and then use the application from his desktop.  The application abstracts away the supercomputer's command-line terminal, providing a seamless way to literally supercompute with close to zero technical expertise.

Matlab is a really nice example of this, as it is really becoming an industry-standard application suite and programming language for scientists and engineers.  It's got simple syntax and a rich set of mathematical features that allow its users to easily accomplish mathematically complicated tasks, and an increasing number of non-traditional users have been coming to us with Matlab codes that are beginning to take too long to calculate on their desktops.  When this happens, using our supercomputers to speed up their calculations is often literally a matter of following our online tutorial on how to interface Matlab with Gordon the first time, changing one function call in their Matlab code, and clicking the "run" button.

Similarly, VisIt (and other visualization applications like ParaView) incorporate transparently interfacing with supercomputers to do complex visualization and rendering.  Again, setting this up is all documented in a simple tutorial, and both of these applications are written with scalability and high-performance in mind.  Despite the high-performance capabilities of the code, the user doesn't have to worry about any of this beyond the initial one-time configuration.  The only difference between running purely on the desktop and running on the supercomputer backend is a simple click.

Granted, these codes both sometimes require manual intervention on the supercomputer such as transferring files, or in the case of visualization, generating the data to be visualized through some other job.  However, there exists another mode of supercomputing via a desktop-style interface that can completely mask the technical specifics of supercomputers from researchers.

Web- and mobile-based HPC applications and portals

Using web-based frontends to completely abstract the supercomputer is an emerging trend in supercomputing that is growing by leaps and bounds because it puts the utility of supercomputing in the hands of researchers who have literally zero experience in supercomputing.  This level of abstraction, where the user interacts entirely with a website rather than a terminal, can deliver supercomputing to anyone who can navigate a website and is diametrically opposite to the sentiment that supercomputers are a special technology that require particular expertise.  It completely removes the need for users to download any particular software, use any particular operating system, or really use any particular device.

I suspect a lot of traditional supercomputerists (the ones who think supercomputers are NOT computers) will quickly cast these sorts of user-friendly frontends down as pie-in-the-sky ideas that never actually see use.  To that, I provide these facts to the contrary:

Credit: XDMoD
  • Community gateways, or websites that provide a subset of HPC applications to registered users through a web frontend that accepts input from users and delivers output, are blowing up.  The CIPRES gateway, for example, has been seeing exponential growth for years, has resulted in over 600 peer-reviewed publications, and represents 14% of all the core hours burned on SDSC's Trestles resource in 2013 so far.  CIPRES is producing real science for a nontraditional research community using supercomputers, and none of those researchers ever have to know what's going on at the supercomputer level.
  • NSF has taken notice of the utility of these gateways in reaching new communities and had specific language in their last round of big-iron funding proposals to provide the infrastructure and backend for these gateways and web-based workflow interfaces.  SDSC's upcoming Comet machine, a $12 million hardware investment by the NSF, was awarded on this basis.  It will provide many new mechanisms specifically designed to make supercomputers easier to new users--again, the diametric opposite of requiring "expertise in the technology."  Users can focus on having "expertise in their domain" and not worry about the technology.
  • The XSEDE program, also funded by the NSF, has full-time staff who are experts in gateway development and whose sole responsibility is providing long-term consulting for research communities who are trying to develop a gateway.  XSEDE also provides a rich set of documentation about gateway development and integration, and most XSEDE resources provide interfaces for GRAM, UNICORE, and Globus with the goal of abstracting away the differences in system software configurations from gateway developers.
In addition to these heavyweight web-based front-ends, a range of lightweight software is emerging that allows users to operate via a web interface while maintaining much finer-grained control over what's happening on their supercomputer.  iPython Notebook is a great example; although physically running on a compute node within a user-submitted job, iPython Notebook presents a temporary website to which a user can connect and interactively use Python directly from the browser on their laptop.

Since these interfaces are entirely web-based, it follows that they are actually decoupled from a user's laptop or desktop entirely--supercomputers can, in principle, be used through mobile devices.  Although I admit that this is beginning to get into that pie-in-the-sky realm, TACC has been doing some really innovative stuff to bridge HPC and mobile devices.  Their recently announced TACC app for iOS is remarkably feature-rich; it lets you interact with user support staff and monitor your jobs from the comfort of your toilet.  

While this is perhaps a bit silly, the forward-looking implications are quite interesting.  A day may soon come where a researcher can input data (audio, video, GPS, weather conditions) directly from a mobile device, have it processed on a supercomputer in some meaningful way, and have updated results delivered right back to that device in near real-time.

For an example closer to reality, consider the work that Pratt & Whitney (among others) are doing to develop real-time simulation of engine performance and health.  The idea is that, if a computer can simulate the conditions of how the engine in a flying aircraft will be performing a few minutes into the future, engine performance can be pre-optimized and failures can be detected earlier.  Right now these sorts of computations are done on-board, but what if wireless communication could offload the input data (engine parameters) to a ground-based supercomputer, the predictive model can be calculated, and the results sent back to the aircraft in-flight?

Moving forward

There are a number of other little problems with the analogies in the HPCwire article, and I realize that no analogy is perfect.  However, I do think all of the analogies given are becoming less relevant as the state of research supercomputing moves forward.  As a final case in point, the two simple analogies given relate supercomputers to a single device (an F1 car or a shovel) to be used by a single person to accomplish something faster (drive around the track or dig a hole).  This was once true, but that time has passed.

Rather, the reality is that very few supercomputers are doing hero runs (or even several large capability runs) all the time; supercomputing workloads are increasingly capacity-driven, not capability-driven, and this imbalance will only tip more towards capacity more as the field attracts newer communities whose codes don't scale to the increasing core counts in modern machines.

If there's a message to be taken away from this long exposition, it's that supercomputers ARE computers.  They're not a magical instrument to be used by a select privileged few.  They don't have to be hard to use, and they can run the same scientific applications that run on desktops.  They're a shared resource, and unlike an instrument like an expensive scope, they're pretty hard to seriously damage.  There's a lot of people out there trying to make them as easy as possible for new users to get on and start being productive, and I'm one of them.

So, please don't share the analogy that supercomputers aren't computers with the public.  They are computers, and they're not as different from desktops as technologists might wish they were.  It doesn't help to tell the public that supercomputers are so complicated that they cannot be understood by Freecell-playing researchers.  It's just not true anymore.

Friday, October 18, 2013

Track 2 and Comet/Wrangler

Just before our federal government shut down, the National Science Foundation announced that SDSC would be awarded a $12 million piece of its most recent ACI Track 2 solicitation, Track 2F.  The Track 2 program, initially designed to award $30 million (on a competitive basis) every year to deploy a new supercomputer into XSEDE, has been what's funded the majority of the compute cycles available via XSEDE.  Here's a brief summary:
  1. Track 2A funded Ranger (OCI-0622780), a 479-teraflop Sun Constellation cluster at TACC
  2. Track 2B funded Kraken (OCI-0711134), a Cray XT machine that evolved into a petaflop machine at NICS
  3. Track 2C was going to fund a 197-rack SGI Altix UV at PSC, but SGI went bankrupt
  4. Track 2D funded three separate projects of an experimental design:
    1. Gordon, a data-intensive supercomputer at SDSC (OCI-0910847)
    2. Keeneland, a GPU-rich supercomputer at NICS (but led by Georgia Tech; OCI-0910735)
    3. FutureGrid, a physically distributed cloud testbed led by Indiana and housed all over the US (OCI-0910812)
  5. Track 2E funded Stampede, a 2.2-petaflop machine with an additional 7.4 petaflops of Intel Xeon Phi coprocessors (OCI-1134872)
  6. Track 2F will be funding
    1. Comet at SDSC (initially named Wildfire; ACI-1341698), and
    2. Wrangler at TACC (ACI-1341711)
This history of Track 2 should illustrate how much of a big deal these awards are, and it's pretty exciting to be involved in the process of proposing, winning, and deploying a machine of this scale.


Unfortunately, the details on these two Track 2F systems are sparse, and deliberately so.  Both systems will be deployed in late 2014 and put into production by 2015, meaning the hardware that was proposed and awarded hasn't even hit the market yet and probably remains, quite literally, an industry secret.

Both the SDSC press release and the NSF award abstract for Comet emphasize that Comet will be more a capacity machine than a capability machine, and it will serve the needs of the "98%" of XSEDE users who aren't running massively parallel jobs.  While this figure may sound surprising, it's really true--verify yourself by logging into XDMoD and following this procedure:
  1. click the usage tab
  2. collapse the "Jobs Summary" option on the left, then expand the "Jobs by Job Size" option
  3. click the "number of jobs" option
  4. along the top (under the row of tabs), change "Duration:" to "previous year"
In some sense, the overall mission of this machine will be a lot like SDSC's Trestles resource which is specially tuned for fast turnaround times and small jobs.  The big difference is that Comet's acquisition cost is 4x larger than Trestles was, and its target peak FLOPS are 18x-20x higher.

This isn't to say that Comet will be a completely vanilla system like Trestles tends to be though; in the press release, SDSC's director made the bold statement that Comet will be "the world’s first virtualized HPC cluster."  This admittedly sounds a bit hokey since I've detailed why cloud computing is no good for HPC, but Comet will actually be quite different in that its use of SR-IOV will allow virtual sub-clusters to run applications over Infiniband at near-native speeds.  A few colleagues and I did a good amount of application testing and benchmarking to make sure that the approach was viable, and I'm excited to see this stuff make it into production.

Speaking of sub-clusters, the press release also says that each Comet rack will have 72 nodes connected at full bisection, FDR bandwidth.  The inter-rack links will be oversubscribed (4:1) like Gordon's inter-switch torus links (16:3), suggesting that 72 nodes will be a magical number in terms of maximum optimal job size akin to Gordon's 16-node magic size.  How these architectural details translate into policies remains to be announced by SDSC, and hopefully a lot more juicy details about Comet will be released next month at SC.  Last I heard, Comet will be officially announced at SDSC's booth on Tuesday afternoon.


HPCwire ran a very enlightening article on Wrangler based on what I assume was an interview with co-PI Chris Jordan, and in doing so, spilled some beans about Intel's upcoming Haswell-based Xeon processors.  I am hesitant to comment further since I know that numbers like the cores-per-processor mentioned in the HPCwire article haven't even been announced by Intel yet and thus may be a non-disclosure minefield, but the HPCwire article and the NSF award abstract do highlight a few interesting points about Wrangler:
  • it will have "3000 next generation" Intel cores
  • it will be a "Dell-supplied 120 node cluster"
  • the award payout was only $5 mil instead of the $6 mil put forth in the Track 2F solicitation (although this may have to do with the end of FY'13, the government shutdown, etc)
The cost-per-node works out to be ludicrously high; by comparison, SDSC's Trestles system has 324 nodes each with 32 cores, 64 GB RAM, and an SSD for a measly $3 million.  The fact that twice the hardware budget is buying a third of the nodes suggests that Wrangler is going to have some fancy (read: expensive) magic under the hood.  The Track 2 awards are historically very cost-sensible though, so it's highly unlikely that the NSF got taken for a ride by awarding this proposal.  DSSD, a company founded by Andy Bechtolsheim (who also founded Sun Microsystems, Arista, and had a hand in developing Sun's famed Thumper), is a key strategic partner in Wrangler too, so there might be some really revolutionary ideas arising from this award.

Again, I look forward to hearing more details in Denver next month.