Multiple Machines, One Environment
Capture What You Compute...
...And Compute to Capture Results
he tremendous explosion of computational technologies has raised a new kind of question for each scientific discipline to answer. Instead of asking, "Which computer is best for my problem?" scientists must now ask, "What kind of computational environment do I need to examine, pose, and answer my questions best?" The answers "comprise a revolution in our thinking about computational sciences," said SDSC senior principal scientist Kim K. Baldridge, who leads a project in tool development for computational chemistry within the National Biomedical Computation Resource (NBCR) at SDSC. "We have been working on the components of an answer in computational chemistry for many years, in some cases, and now we are finding that our answers have reformulated our questions!"
|Figure 1. Triquinane Transitions
This image was produced by Greg Johnson of SDSC using his MPIRE visualization code. With appropriate equipment, it may be seen as a three-dimensional rendering of the transition state between two carbocation structures of the molecule triquinane. Red and dark blue show regions of high electron occupancy (but opposite orbital phase), fading to low-density greens. The calculations were done by Kim Baldridge at SDSC.
Baldridge presented the work of her NBCR group and collaborators at a symposium on New Computational Architectures in Chemistry, held at the annual meeting of the American Chemical Society (ACS) in San Diego in April 2001. A report on the symposium by Elizabeth K. Wilson was published as a cover story in the April 30, 2001, issue of the ACS weekly magazine Chemical & Engineering News. (Figures 1 and 2).
Multiple Machines, One Environment
Baldridge and SDSC scientist Jerry Greenberg have led the effort to create grid-based tools for computational chemistry within NBCR, which is supported by the NIH National Center for Research Resources. "Chemists have a wide variety of computing needs," Baldridge explained. "Because our group has evaluated and used a variety of high-performance computer architectures, we have gained a thoroughgoing appreciation of their similarities and differences and of the way in which the architectural variety actually presents new opportunities for a unified, grid-based approach to the broadest range of chemical problems."
The computational engine of the running environment is GAMESS, a code originating from the research group of Mark S. Gordon (now of Iowa State University, Ames, where he is also director of a Department of Energy Scalable Computing Laboratory) that included Baldridge. GAMESS is a widely used, open-source, ab initio quantum chemistry package. As a co-developer of the code, Baldridge has worked since the late 1980s to make GAMESS available on a variety of platforms and with a variety of enhancements that now constitute an environment for execution and visualization. In addition, she has made algorithmic enhancements to enable further quantum chemical prediction, including recent algorithmic developments to handle solvation.
"We now think not just about the high-performance computers and their architectures, but about grid-based and data-oriented tools that enable visualization and analysis on the fly," Baldridge said. In particular, she and Greenberg have worked to incorporate their code, QMView, on top of the GAMESS running environment. QMView can begin with GAMESS output files and deliver both 2-D and volume visualization in a variety of modes–for example, enabling 3-D viewing of electron density, calculated molecular orbitals, surface volumes, and difference densities (e.g., molecular electrostatic potentials, molecular orbitals, and solvent surfaces).
|Figure 2. Orbital Visualization
This figure depicts the highest occupied molecular orbital in the 9,10-dimethylanthracene photodimer, as computed by Kim Baldridge and visualized using MPIRE by Greg Johnson.
Capture What You Compute...
In finished form, the environment would be accessible from a client Web browser through a grid portal, a Web server that can access applications servers (one for GAMESS, one for QMView) and act as a service broker for grid-wide services, including Globus, the SDSC Storage Resource Broker, and a variety of shells and transport services. "The next step, of course, is to connect a database or databases to the running environment that can incorporate results from chemical calculations in such a way as to make them reusable in further calculations and cross-compatible with other databases of the same type," Baldridge said. "The ultimate objective is to capture more of what we’re computing to enhance scientific prediction."
The architectural variety encompassed in the studies is extremely broad. "For this presentation, we focused on evaluation of four different architecture types," Baldridge reported. NPACI’s Blue Horizon, an IBM machine at SDSC containing 144 Nighthawk-II nodes with eight processors per node, is a large, distributed-memory machine with a peak speed of 1.7 teraflops (trillions of floating-point operations per second). An experimental Sun HPC machine with 28 nodes incorporating processors running at 750 MHz served as an exemplar of large shared-memory architecture, as did the Cray MTA, which also decomposes code into parallel, fine-grain "threads."
Finally, Baldridge and Greenberg evaluated the SDSC Meteor cluster, built under the direction of SDSC computer scientist Phil Papadopoulos and assembled and maintained with the NPACI Rocks software developed by Papadopoulos and colleagues at SDSC and UC Berkeley. The Meteor Linux cluster includes approximately 90 Myrinet-linked two-processor Intel Pentium III nodes, running at nearly 1.0 GHz. Thirty-five of the nodes are funded by NBCR.
"The Rocks software makes the cluster architecture particularly robust and extendible," Papadopoulos said. "For example, a cluster node can be reinstalled in about 10 minutes, or the interconnect can easily be upgraded to something like Compaq’s Servernet II, because of the modularity of the Rocks cluster software." Meteor is now being connected to other clusters, including a Sun server cluster run by NBCR, the Keck I and Keck II satellite clusters on the UCSD campus, and clusters at The Burnham Institute and The Scripps Research Institute.
Platform comparisons were made on benchmark calculations for the enzyme luciferin and an illudin-based anticancer drug called HMAF (under investigation by Trevor McMorris of the UCSD Chemistry Department and UCSD graduate student Laura Gregerson of Baldridge’s group). "There are many tradeoffs among all these architectures," Baldridge said, "but we are working to be able to use them transparently, in concert, to offset latencies and obtain maximum end-to-end throughput." An experimental GAMESS Web portal based on the NPACI GridPort model is now available, she said, thanks to a collaboration with Mary Thomas, leader of SDSC’s Computational Science Portals group, and portals group member Maytal Dahan.
...And Compute to Capture Results
In addition to the luciferin and HMAF studies, the Baldridge group has been exercising GAMESS on the various architectures to study new materials, biological fluorescence, and environmental reaction processes in solution. The new materials work includes a series of calculations involving cyclooctatetraene analogues, one of which that predicts a state of unexpected antiaromaticity. In addition, another of the series of compounds was later synthesized by a group in Japan, and it exhibited exactly the properties predicted. Work on biological fluorescence was done in collaboration with Roger Tsien of UCSD and was recently published in the Proceedings of the National Academy of Sciences.
The environmental work, performed by Gregerson and published in the Journal of Physical Chemistry, was a base-level study involving methyl-t-butyl ether (MTBE), as well as a series of analogues. MTBE is used in California as a gasoline additive. It came under scrutiny recently when it was found leaching into groundwater, where it may be hazardous (as a cancer-causing agent). Potential MTBE hazards need to be further investigated for full understanding. "These studies and a broad range of other work should give some idea of the versatility of the hardware systems and the quantum chemical environment we have been designing," Baldridge said.
Future plans include the addition of scheduling software, including a governing mechanism like the Adaptive Parameter Study Template that is part of the Applications Level Scheduler (AppLeS) under development in the UCSD group of Fran Berman that will be continued at SDSC under the direction of Henri Casanova.
"This is yet another dimension along which we will be able to unify a spectrum of underlying resources in the service of an overall application," Baldridge said. "Ultimately, we believe a good quantum mechanical biomedical framework or infrastructure would include tutorial software, access to databases like the Protein Data Bank, a QMView facility also containing a data bank, a capability to run a variety of compute engines such as GAMESS or other large modeling software packages, and a scheduler to optimize the use of all available resources." –MM
Kim K. Baldridge
Jerry P. Greenberg
Laura N. Gregerson
Sun Microsystems, Inc.
L.A. Gross, G.S. Baird, R.C. Hoffman, K.K. Baldridge, and R.Y. Tsien (2000): The structure of the chromophore within DsRed, a red fluorescent protein from coral, Proceedings of the National Academy of Sciences 97:11990-11995.
L.N. Gregerson, J.S. Siegel, and K.K. Baldridge (2000): Ab initio computational study of environmentally harmful gasoline additives: Methyl tert-butyl ether and analogues, Journal of Physical Chemistry A 104:11106-11110.