| Multiple
Machines, One Environment
Capture
What You Compute...
...And
Compute to Capture Results
he
tremendous explosion of computational technologies has raised
a new kind of question for each scientific discipline to answer.
Instead of asking, "Which computer is best for my problem?"
scientists must now ask, "What kind of computational environment
do I need to examine, pose, and answer my questions best?"
The answers "comprise a revolution in our thinking about
computational sciences," said SDSC senior principal scientist
Kim K. Baldridge, who leads a project in tool development for
computational chemistry within the National Biomedical Computation
Resource (NBCR) at SDSC. "We have been working on the components
of an answer in computational chemistry for many years, in some
cases, and now we are finding that our answers have reformulated
our questions!"
 |
Figure 1. Triquinane
Transitions
This image
was produced by Greg Johnson of SDSC using his MPIRE visualization
code. With appropriate equipment, it may be seen as a three-dimensional
rendering of the transition state between two carbocation
structures of the molecule triquinane. Red and dark blue show
regions of high electron occupancy (but opposite orbital phase),
fading to low-density greens. The calculations were done by
Kim Baldridge at SDSC. |
Baldridge presented
the work of her NBCR group and collaborators at a symposium on
New Computational Architectures in Chemistry, held at the annual
meeting of the American Chemical Society (ACS) in San Diego in
April 2001. A report on the symposium by Elizabeth K. Wilson was
published as a cover story in the April 30, 2001, issue of the
ACS weekly magazine Chemical & Engineering News.
(Figures 1 and
2). Multiple
Machines, One Environment Baldridge and SDSC
scientist Jerry Greenberg have led the effort to create grid-based
tools for computational chemistry within NBCR, which is supported
by the NIH National Center for Research Resources. "Chemists
have a wide variety of computing needs," Baldridge explained.
"Because our group has evaluated and used a variety of high-performance
computer architectures, we have gained a thoroughgoing appreciation
of their similarities and differences and of the way in which
the architectural variety actually presents new opportunities
for a unified, grid-based approach to the broadest range of chemical
problems." The computational engine
of the running environment is GAMESS, a code originating from
the research group of Mark S. Gordon (now of Iowa State University,
Ames, where he is also director of a Department of Energy Scalable
Computing Laboratory) that included Baldridge. GAMESS is a widely
used, open-source, ab
initio quantum
chemistry package. As a co-developer of the code, Baldridge has
worked since the late 1980s to make GAMESS available on a variety
of platforms and with a variety of enhancements that now constitute
an environment for execution and visualization. In addition, she
has made algorithmic enhancements to enable further quantum chemical
prediction, including recent algorithmic developments to handle
solvation. "We now think
not just about the high-performance computers and their architectures,
but about grid-based and data-oriented tools that enable visualization
and analysis on the fly," Baldridge said. In particular,
she and Greenberg have worked to incorporate their code, QMView,
on top of the GAMESS running environment. QMView can begin with
GAMESS output files and deliver both 2-D and volume visualization
in a variety of modesfor example, enabling 3-D viewing of
electron density, calculated molecular orbitals, surface volumes,
and difference densities (e.g., molecular electrostatic potentials,
molecular orbitals, and solvent surfaces).
 |
Figure 2. Orbital
Visualization
This figure depicts the highest occupied molecular orbital
in the 9,10-dimethylanthracene photodimer, as computed by
Kim Baldridge and visualized using MPIRE by Greg Johnson.
|
Capture
What You Compute... In finished form, the
environment would be accessible from a client Web browser through
a grid
portal, a Web
server that can access applications servers (one for GAMESS, one
for QMView) and act as a service broker for grid-wide services,
including Globus, the SDSC Storage Resource Broker, and a variety
of shells and transport services. "The next step, of course,
is to connect a database or databases to the running environment
that can incorporate results from chemical calculations in such
a way as to make them reusable in further calculations and cross-compatible
with other databases of the same type," Baldridge said. "The
ultimate objective is to capture more of what were computing
to enhance scientific prediction." The architectural variety
encompassed in the studies is extremely broad. "For this
presentation, we focused on evaluation of four different architecture
types," Baldridge reported. NPACIs Blue Horizon, an
IBM machine at SDSC containing 144 Nighthawk-II nodes with eight
processors per node, is a large, distributed-memory machine with
a peak speed of 1.7 teraflops (trillions of floating-point operations
per second). An experimental Sun HPC machine with 28 nodes incorporating
processors running at 750 MHz served as an exemplar of large shared-memory
architecture, as did the Cray MTA, which also decomposes code
into parallel, fine-grain "threads." Finally, Baldridge
and Greenberg evaluated the SDSC Meteor cluster, built under the
direction of SDSC computer scientist Phil Papadopoulos and assembled
and maintained with the NPACI Rocks software developed by Papadopoulos
and colleagues at SDSC and UC Berkeley. The Meteor Linux cluster
includes approximately 90 Myrinet-linked two-processor Intel Pentium
III nodes, running at nearly 1.0 GHz. Thirty-five of the nodes
are funded by NBCR. "The Rocks software
makes the cluster architecture particularly robust and extendible,"
Papadopoulos said. "For example, a cluster node can be reinstalled
in about 10 minutes, or the interconnect can easily be upgraded
to something like Compaqs Servernet II, because of the modularity
of the Rocks cluster software." Meteor is now being connected
to other clusters, including a Sun server cluster run by NBCR,
the Keck I and Keck II satellite clusters on the UCSD campus,
and clusters at The Burnham Institute and The Scripps Research
Institute. Platform comparisons
were made on benchmark calculations for the enzyme luciferin and
an illudin-based anticancer drug called HMAF (under investigation
by Trevor McMorris of the UCSD Chemistry Department and UCSD graduate
student Laura Gregerson of Baldridges group). "There
are many tradeoffs among all these architectures," Baldridge
said, "but we are working to be able to use them transparently,
in concert, to offset latencies and obtain maximum end-to-end
throughput." An experimental GAMESS Web portal based on the
NPACI GridPort model is now available, she said, thanks to a collaboration
with Mary Thomas, leader of SDSCs Computational Science
Portals group, and portals group member Maytal Dahan. ...And
Compute to Capture Results In addition to the
luciferin and HMAF studies, the Baldridge group has been exercising
GAMESS on the various architectures to study new materials, biological
fluorescence, and environmental reaction processes in solution.
The new materials work includes a series of calculations involving
cyclooctatetraene analogues, one of which that predicts a state
of unexpected antiaromaticity. In addition, another of the series
of compounds was later synthesized by a group in Japan, and it
exhibited exactly the properties predicted. Work on biological
fluorescence was done in collaboration with Roger Tsien of UCSD
and was recently published in the Proceedings of the National
Academy of Sciences.
The environmental work,
performed by Gregerson and published in the Journal
of Physical Chemistry,
was a base-level study involving methyl-t-butyl ether (MTBE),
as well as a series of analogues. MTBE is used in California as
a gasoline additive. It came under scrutiny recently when it was
found leaching into groundwater, where it may be hazardous (as
a cancer-causing agent). Potential MTBE hazards need to be further
investigated for full understanding. "These studies and a
broad range of other work should give some idea of the versatility
of the hardware systems and the quantum chemical environment we
have been designing," Baldridge said. Future plans include
the addition of scheduling software, including a governing mechanism
like the Adaptive Parameter Study Template that is part of the
Applications Level Scheduler (AppLeS) under development in the
UCSD group of Fran Berman that will be continued at SDSC under
the direction of Henri Casanova. "This is yet another
dimension along which we will be able to unify a spectrum of underlying
resources in the service of an overall application," Baldridge
said. "Ultimately, we believe a good quantum mechanical biomedical
framework or infrastructure would include tutorial software, access
to databases like the Protein Data Bank, a QMView facility also
containing a data bank, a capability to run a variety of compute
engines such as GAMESS or other large modeling software packages,
and a scheduler to optimize the use of all available resources."
MM

|
Project
Leader
Kim K. Baldridge
SDSC
Participants
Jerry P. Greenberg
Laura N. Gregerson
Phil Papadopoulos
SDSC
John Feo
Sun Microsystems, Inc.
Barry Bolding
Cray, Inc.
References
L.A. Gross, G.S. Baird, R.C. Hoffman, K.K. Baldridge,
and R.Y. Tsien (2000): The structure of the chromophore within DsRed,
a red fluorescent protein from coral, Proceedings of the National
Academy of Sciences 97:11990-11995.
L.N. Gregerson, J.S. Siegel, and K.K. Baldridge (2000): Ab initio
computational study of environmentally harmful gasoline additives:
Methyl tert-butyl ether and analogues, Journal of Physical Chemistry
A 104:11106-11110.
|