Toward Petascale Applications: Q&A with Amit Majumdar
Amit Majumdar is the group leader of SDSC's Scientific Computing Applications group and directs the Strategic Applications Collaboration (SAC) program in the User Services division . He has been at SDSC since 1997 and holds a Ph.D. in Nuclear Engineering and Scientific Computing from the University of Michigan.
You work with SDSC's Strategic Application Collaborations or SAC program. Tell us a little about the goals of the program.
Majumdar: In a nutshell, the goal of our SAC efforts and the related Strategic Community Collaborations (SCC) is to help users achieve breakthrough computational science and engineering that they simply couldn't do before. We help them push the envelope of their applications on the Center's state-of-the-art high performance computing, data, and software resources, and we have a demonstrated track record of collaborating and scaling up applications for today's largest machines, especially for the data-intensive applications that are SDSC's strength. We do this by developing close collaborations between SDSC's expert computational scientists and the researchers who use SDSC resources. An important key to these collaborations is the multidisciplinary expertise of SDSC's computational scientists, who have backgrounds ranging from engineering, biosciences, chemistry, astronomy, and physics, to geosciences. And complementing this, our group also brings expertise in high performance computing architectures and software and parallel computing. I must add that our SAC work is a team effort, and it's rewarding to work with the skilled group we have at SDSC, you're always learning something new. Finally, another important goal is that we try to leverage our efforts by developing methods or solutions that are general so they can be used by the wider computational science community.
Tell us about some of your experiences working with users.
Majumdar: One thing that's interesting is that because we collaborate with users from many disciplines and NSF directorates, we develop a broad vision of computational science across many fields from the physical sciences to biological and medical sciences, as well as engineering and such relatively new user communities as the social sciences. There are lots of examples of successful SAC projects, you can read about them on our website and in EnVision magazine. Many of these applications have both a high performance computing and a data emphasis, We've enabled accurate protein structure prediction and protein design, high-resolution turbulence calculations, and new molecular dynamics simulations to help more efficiently convert plant biomass to ethanol. At the other end of the size and time spectrum we've helped run high resolution astronomy simulations of the evolution of the universe, and detailed 3-D simulations of earthquake-induced ground motion in southern California that produced as much as 50 terabytes of data per simulation.
In many of these projects we've been able to help scale up the codes orders of magnitude in the number of processors they can run on, which helps the researchers cross new thresholds of higher resolution and speed. For example, a recent SAC project was in protein structure prediction in the Critical Assessment of Structure Prediction (CASP) competition. SDSC's work resulted in the first structure prediction to be completed in matter of hours instead of weeks, by enabling the code to run on more than 40,000 processors of the IBM Blue Gene Watson system. We also helped scale up a sophisticated Direct Numerical Simulation turbulence code by implementing a new domain decomposition technique, so that it recently ran on some 16,000 processors of the Watson system, successfully completing part of a 4096^3 size computation, the highest resolution yet attempted in the U.S.
While all of our work adds up to advance science and engineering, it's especially exciting when a project also has some direct human impact. For example, in a nuclear medicine project the researchers are now able to do patient-specific Monte Carlo simulations that can improve a doctor's ability to design safe and effective cancer treatments. And in the earthquake-related projects we've done, the large scale simulations are generating data and insights that can play a role in developing better building codes and safer buildings, potentially saving many lives and billions of dollars.
What do you think the challenges will be for users in developing petascale applications?
Majumdar: The next few years will be very exciting for HPC and computational science and for computational scientists like us. We'll begin to see petaflop machines installed at various sites, and the big challenge will be scaling up applications to efficiently run on petaflop machines that have hundreds of thousands to millions of processors, innovative interconnects and I/O. There will possibly be new multi-core architectures or heterogeneous processors and many other changes such as a new parallel programming paradigm that will potentially involve a global address space in addition to the traditional message passing programming. Hand-in-hand with the growing computing power and numbers of processors we will be dealing with the mushrooming size of data, both simulation and observational data.
To meet these challenges, SDSC is playing a central role in weaving together a number of interlinked research topics to open the door for petascale applications. First and foremost is to apply our broad, multi-domain experience to work closely with researchers and identify the petascale science that offers the greatest opportunities for them to make real advances. And we'll also help identify the computational requirements such as domain size, mesh resolution, and time steps, and match those with the architectures of future petaflop machines. We'll also need to thoroughly profile and understand the computer science characteristics associated with current applications such as memory usage, CPU, interconnect, and I/O, along with the computational science methods used in the applications. It will take integrating all of these factors to see whether we can scale up their current applications, for example by improving single processor performance, I/O performance, communication performance, or implementing different domain decomposition techniques, or whether we'll have to work with the domain scientists to implement, from scratch, new algorithms or numerical methods. This research will be both exciting and challenging, and SDSC has a real contribution to make by doing this for all the disciplines of computational science and engineering.
Your field, computational science, is interdisciplinary. Tell us about your background.
Majumdar: My graduate work was in the interdisciplinary field of nuclear engineering and scientific computing, so I learned computational science as part of my research. I used Monte Carlo methods for my Ph.D thesis work and have been involved with and interested in parallel algorithms and parallel machines since my graduate student days. This experience made me a good fit for my current position at SDSC, where we have to know both the science and engineering side of things to be able to "speak the language" of the users, and at the same time we need to understand the intricate issues of high performance computing. As you scale things up in terms of computational problem size and data size, entirely new challenges appear that simply aren't issues with smaller computations. So it's never boring, we're always facing new challenges and opportunities.
What do you like to do for fun?
Majumdar: I like to read, some recent books I've enjoyed are by authors like Kenzaburo Oe, Kazuo Ishiguro, V. S. Naipal, Eddy Harris, and Sujata Massey. I also like to go to coffee shops, watch college sports, meet up with my colleagues and friends at restaurants in San Diego , and I go to the gym few times a week. I live pretty close to the UCSD campus and I enjoy the walk to work every morning through the campus and the evening walk back home.
Find more information about SDSC's Strategic Applications Collaboration (SAC) program online at: http://www.sdsc.edu/us/sac/