Inside HPC: Q&A with SDSC's Allan Snavely
Allan Snavely is the Director of the Performance Modeling and Characterization (PMaC) Laboratory at the San Diego Supercomputer Center and Assistant Adjunct Professor for the Department of Computer Science and Engineering at the University of California, San Diego.
Q. Why are 'real benchmarks' important in high-performance computing?
Snavely: A wise man once said, "what you measure is what you get."* He was talking about student aptitudes, but the same holds true in many domains, including high-performance computing (HPC). There is a wide consensus now that what we want in HPC is fast time-to-solution for scientific simulations. Unfortunately, for a while now, we have been measuring something different-something sometimes facetiously called "Macho Flops."
* Reference: Hummel, J., & Huitt, W. (1994, February). What you measure is what you get. GaASCD Newsletter: The Reporter, 10-11.
Q. What is the significance of the Top 500 List? Likewise, what is the tyranny of the Top 500 List?
Snavely: The Top 500 List provides a ranking of the world's computers based on their performance on the High-Performance Linpack (HPL) benchmark, a test that tracks Macho Flops, that is, nearly the peak performance of a machine in floating-point operations per second under ideal conditions. Most people understand that "real performance will probably be less," but many people still think that the Top 500 List serves as a useful indicator of relative performance. That is, they assume a machine higher on the list would run their application faster than a machine lower on the list (though probably not as fast as HPL). This isn't true. In a paper forthcoming at SC05 we show HPL is a poor predictor of relative performance of real applications. As a predictor, HPL has a 60 percent average error and 60 percent standard deviation across a large set of applications and many of the architectures of the Top 500. That means there is a good chance a machine higher on the list will actually run an application slower than some machines below it on the list.
HPL is easily beaten as a relative performance predictor by other simple benchmarks such as STREAM (which measures bandwidth to main memory) and RandomAccess, which essentially measures latency to main memory. The widely held misconception that relative Macho Flops is a good indicator of relative real performance reminds me of other "obvious" facts such as that heavy objects fall faster than lighter objects-a "fact" so "obvious" that no one thought to actually check it before Galileo! A simple test of the hypothesis shows it to be false. (I am not comparing myself to Galileo, but I don't mind comparing my good friend John McCalpin, author of the STREAM benchmark, to Galileo).
The HPC Challenge Benchmarks include HPL, STREAM, and RandomAccess. The paper* we'll be presenting at SC05 has details on how one may improve the Top500 list by using a combination of these. However, my work and the work of the Performance Modeling and Characterization (PMaC) lab is mostly focused on understanding and enabling fast time-to-solution of real applications, rather than simplistic rankings.
* Reference: L. Carrington, M. Laurenzano, A. Snavely, R. Campbell, L. Davis
SC05, November 2005, Seattle. How well can simple metrics represent the performance of HPC applications?
Q. Tell us about your experiences as session lead for the Cyberinfrastructure+Social Sciences workshop.
Snavely: Speaking of workshops, I also want to mention the recent NSF Workshop on Benchmarks where I gave an invited talk. NSF is holding workshops and working groups on benchmarking to inform future procurements, such as the one in the current NSF Office of Cyberinfrastructure request for proposal. At the talk, several people opined that no matter what other benchmarks are proposed, winning proposals have to maximize Top 500 rank for the dollar! This is a very pernicious mindset that must be stamped out if we are to deliver systems capable of reducing time-to-solution on real scientific applications.
Here's what is wrong with that mindset. Scientific applications need to perform floating-point operations, but they also need to move data. They need to bring in operands for the floating-point operations from memory. Consider a mathematical expression like A = B + C, with three operands and one floating-point operation. To carry out this calculation, the computer has to move B and C in from memory, calculate the addition, and store A back to memory. On today's machines, the step of fetching the operands takes several 10s to several 100s of times longer than just computing the Flops. How long it takes depends on where in the memory hierarchy the operands reside. If the operands are in small, temporary memory near the processor (called cache), then it is relatively fast. But if it is from main memory then it's slow. The simplistic HPL benchmark includes a lot of floating-point calculations but not a lot of operand fetches from main memory, whereas most real applications do a lot more operand fetches and more from main memory. A balanced architecture, to reduce time-to-solution, should maximize the critical resource and not the cheap resource. In other words, to enable fast scientific simulation, a balanced machine would more likely maximize data-moving capability for the dollar than floating-point capability for the dollar.
The SBE/CISE workshop was an amazing opportunity to reach out to domain scientists in areas not traditionally served by HPC. I met linguists and psychologists with emerging computer applications that will soon require HPC resources. Also I met economists with good ideas for improving resource allocation in HPC and the Grid. The economists also rapidly grasped the pernicious nature of Macho Flops, and we had good talks about the influence of false market indicators on market efficiencies (admittedly this was over beers at the pub). The economists observed that information opacity hampers efficient market dynamics.
In HPC this means that people who want a simple indicator of relative performance think they have one in the Top 500 list, but because it does not predict real performance well, huge inefficiencies arise whereby systems are designed, built, procured, and deployed that in fact are not best suited to the target workload.
Q. What are your views on the future of HPC?
Snavely: I'm excited to go to work every day! I feel we are in a time of HPC renaissance. As part of my job, I sit down with applications people to help them understand and improve performance. People are using supercomputers these days for tasks ranging from modeling the evolution of the universe from the Big Bang, to predicting the migration paths of species, to searching for a possible fifth fundamental force beyond the strong, electromagnetic, weak, and gravitational forces, to modeling human cognition, and many other fascinating uses. Thus, computational scientists are going where no experimental or theoretical scientist could ever go before by using this "third way" of science, that is, science via simulation.
The future is dazzling and I'm encouraging my three-year-old daughter to pursue a career in supercomputing. (My wife says that's OK, but that she has to go to a college with a women's Top 10 Division I lacrosse team such as Dartmouth, Princeton, Johns Hopkins, but sadly not UCSD).
As to the current sad situation regarding Macho Flops, it is only a bump in the road. I'm encouraged by many recent trends including the High End Computing Revitalization Task Force (HECRTF) workshop for which I was vice-chair for Performance, with Horst Simon as chair. The workshop report contained a vision for benchmarking and procurement of supercomputers to take into account the real resource demands of applications. I am encouraged that NSF seems to be adopting an approach like this. My lab has been working for a few years within the Department of Energy (DoE) Scientific Discovery through Advanced Computing (SciDAC) project in the Performance Evaluation Research Center (PERC), and in collaboration with the Department of Defense High Performance Computing Modernization Office, and the Defense Advanced Research Projects Agency (DARPA), DoE, and NSF funded High Productivity Computer Systems (HPCS) program. In all of these efforts we're working with our colleagues to advance a methodology for characterizing the resource demands of applications and mapping them to the low-level capabilities of machines as measured by simple benchmarks such as HPL, STREAM, and Random Access, and thereby better understand and predict performance.
If we can measure the right things, we can make the case for supercomputing. In other words, if we can quantify what scientific simulations need performance-wise from machines, we can help architects, centers, and agencies make the monetary case for balanced systems.
Q. What do you get out of cycling that you enjoy?
Snavely: Cycling keeps me fit and (relatively) sane. No one would be surprised to hear that, given Southern California gas prices and traffic, one can get back and forth to work faster and cheaper by bike. I will point out, though, that I live 25 miles north of SDSC, one-way. Perhaps that puts the gas price and traffic congestion problem into true perspective. I can get back and forth faster and cheaper and I live 25 miles away! In the interest of full disclosure, I only do part of my commuting by bike and drive other days. I always regret it when I drive, though. My commute by bike takes me either on the Coast Highway and then up into the hills of San Marcos or "the back way" through the estates of Rancho Santa Fe, by Lake Hodges, and through the Elfin Forest to home. So on any given gorgeous day in Southern California I can choose if I want to ride past waves and woods, or horse country and lakes. I also enjoy surfing as time allows, but commuting by bicycle certainly fits my busy schedule better!
A cycling anecdote with a moral: I participated in the Tour de Palm Springs a while back. It is a "century" or hundred mile loop around the beautiful valley of Palm Springs. It is not really a race, but of course many people treat it like one. I was near the end, following in the wake of a bunch of strong riders. In cycling, a weaker rider can keep up for a while by drafting. I was about to "go out the back," which is cycling talk for losing contact with the fast group. We came to a turn, and everyone went straight except for me and a woman (the two slowest at the back). We noticed that the sign actually said to turn right. So we turned right, and we beat all those fast guys to the line. In cycling, as in HPC, it is not enough to be fast, you must also be going the right way!