doors down the hallway from SDSCs supercomputers, a student
pored over research papers, probed online databases, and experienced
the thrill of cutting-edge biological research under the guidance
of a skilled mentor. Christopher Peabody, a participant in the
Research Experiences for Undergraduates (REU) program, spent the
summer channeling. He hunted for evidence of tiny channels on
proteins that act like slippery conduits, moving molecules rapidly
from one enzyme to another. Pharmaceutical companies are exploring
the channels as potential targets for new drugs.
discovered that 78 neutral atoms could theoretically settle
into this double icosahedral shape. Image courtesy of Robert
Like a detective, Peabody
investigated computer databasesvirtual haystacks of the
results of millions of biological experiments and analysesin
search of a needle: the identity of a particular channel.
Peabody, a junior at
UC Berkeley, is one of many students who participated in the REU
program at SDSC and PACI partner universities in Kentucky, Maryland,
Montana, Rhode Island, and Tennessee. The REU program, a national
effort funded by the National Science Foundation, is designed
to give undergraduates, particularly women, minorities, and people
with disabilitiesstudents underrepresented in the sciences
and computingrewarding research experiences. The REU students
worked with scientists involved in computational projects ranging
from Web-server software to a database on the brain.
Scientists at UCSD
and elsewhere used computer modeling to discover channels. The
theoretical structures explain how enzymatic reactions can occur
up to 20 times faster than if the substrate molecules simply moved
from one enzyme to another by diffusion.
The homepage of the
National Center for Biotechnology Information was Peabodys
doorway to one of the largest collections of biological data ever
assembled. It is organized so that anybody can quickly find the
amino acid sequence, three-dimensional structure, and other characteristics
of thousands of proteins. Dihydrofolate reducatase (DHFR) is the
scientific name of the particular channeling enzyme Peabody investigated.
DHFR is important because
each cell capable of making DNA, including microorganisms, plant
cells, and most cells in the human body (even cancer cells), uses
this enzyme. Unfortunately, scientists dont know which amino
acids out of dozens that make up DHFR form its channel.
even know how many amino acid residues are needed to make a channel,"
said Peabodys mentor, SDSC associate staff scientist Chris
Smith. "Is it one, two, three, four, or more? What are the
spaces between them?"
Peabody tried to find
out. He compared the amino acid sequences of more than 800 DHFR
enzymes in the NCBI database. He examined enzymes from bacteria
to fruit flies, looking for homologous motifs, or short stretches
of matching amino acids that could be candidates for channels.
"I didnt know anything about computers until I started
here this summer," said Peabody, who is majoring in molecular
and cellular biology and political science at Berkeley. "Im
learning a lotthats the whole benefit for me."
While Peabody was channeling,
REU student Kevin Chan, a sophomore at Harvard University majoring
in mathematics, was "hopping" in another room at SDSC.
Chan used a variation of a well-known mathematical technique to
discover a novel arrangement of atoms missed by other scientists.
He found that 78 neutral atoms could theoretically settle into
the shape of a particular double icosahedron.
The structure Chan
discovered (Figure 1) looks like two 55-atom icosahedrons, 20-sided
objects, fused together, with a few atoms missing. "I enjoy
studying concepts such as symmetry in mathematics," said
Chan "but Im also fascinated with concrete examples
of them. I find that very interesting."
such as the one Chan performed generate geometries that correlate
well with the actual 3-D shapes of clusters found in nature. For
example, molecules in supercooled liquids and simulated molecules
in supercomputer modeling often form icosahedral clusters and
polytetrahedral clusterscollectives of pyramid-shaped pieces.
Chan made his discovery
while working under the direction of Robert Leary, an SDSC applied
mathematician. Leary described Chans cluster in summer talks
at SDSC. The results have been electronically published in The
Cambridge Cluster Database, and Leary and Chan also are planning
to submit the results to a scientific journal.
result is really quite surprising when you consider the considerable
attention given to this problem in previous computational studies,"
Leary and his colleagues
use a variety of modeling techniques to examine how 10 to more
than 100 neutral atoms arrange themselves into the lowest energy
states possible. Their method works like an explorer scanning
new countryside. An algorithm hops from one possible energetic
state to another, each of which corresponds to a different cluster
geometry, looking for the one with the deepest valley.
research, conducted primarily with supercomputer simulations,
may help solve a pressing problems in chemical physicshow
proteins fold into their biologically active, low-energy shapes
out of many trillions of possibilities. Mad cow disease involves
a misfolded protein. Several other human diseases also involve
aggregations of misfolded proteins, including "new variant"
Creutzfeldt-Jacob disease, the human form of mad cow disease.
Chan used a mathematical
expression called the Morse potential to describe the forces between
all the possible pairs of atoms in atomic clusters. He accomplished
the huge mathematical task with a Sun Micro-systems Enterprise
10000 machine, a 32-processor system at SDSC.
Chan started with a
few atoms, then larger aggregations. His basin-hopping results
had been agreeing with those of other researchers until he reached
the 78-atom cluster. Scientists had previously reported that the
lowest energy state for this kind of cluster is derived from a
Mackay icosahedron. To make that 78-atom structure, 23 atoms are
added as a partial layer to a perfect, 55-atom, 20-sided icosahedron.
techniques search energy landscapes by going up and down, looking
for the lowest valleythe global minimum. Learys technique
only goes down. When it finds the lowest valley in a local energy
landscape, the algorithm finds a random new starting place and
searches anew for a different valley, repeating the cycle over
"This new double-icosahedral
structure has a lower potential energy than the other structure,"
said Chan. This means that his structure, not the single icosahedral
structure previously reported, may represent the actual 3-D structure
of 78-atom clusters in nature.
Preserving the mushrooming
data sets of the information age is a central focus of research
in SDSCs Data-Intensive Computing Environments (DICE) group.
REU student Michelle Schumaker worked on software that addresses
this problem under the supervision of SDSC researchers Bertram
Ludäescher and Richard Marciano.
Schumaker helped develop
the prototype of a data-packaging and archival toolkit that can
bundle a collection of files or digital objects, and attach descriptions
called metadata. The toolkit enables a person putting information
into the collection to add metadata for each object. The toolkit
will eventually work with plug-ins to automatically extract metadata,
and to wrap, or translate, the file into the versatile exchange
format, the Extensible Markup Language (XML). "This facilitates
later access and migration to new technologies," said Schumaker.
In this prototype, the metadata can be anything from free text
to classic metadata or knowledge representations involving logic
rules that state properties of the digital objects.
The ability to access
and move records with new hardware and software as it evolves,
requires archival formats that are as infrastructure-independent
and as self-contained as possible. "We need to store not
only the digital objects in a self-describing format like XML,
but also to preserve metadata, that is, information about the
context of the data and its processing history, as well as knowledge
about how to interpret and navigate the data," said Ludäescher.
"We met often
and evaluated what we were doing, so its constantly evolving.
I really liked the fast-moving nature of this research,"
said Schumaker. "Its a stimulating environment. I learned
a great deal every day from conversations with the researchers.
Its very helpful for my career choices, since I learned
first-hand what research is like."
Chris Harper, a 2001
computer science graduate of San Diego State University (SDSU),
said his REU experience was invaluable. "I probably learned
more as an REU student than I did from my college classes,"
he said. "The experience was irreplaceable."
At the NPACI Education
Center on Computational Science and Engineering (EdCenter) at
SDSU, Harper worked on a Web-based project, collecting interactive
learning materials, assignments, reviews, and the names of people
with an interest in computational science. Harper used Java, HTML,
and SQL to update the Computational Science Resources Community
website. The EdCenter also learned from Harper and REU student
Lindsay Stocks. "It has provided fresh perspectives on our
projects," said Kirsten Barber, an EdCenter computer applications
Once Harper became
involved, he realized the positive impact that his work could
have on his career. "I told other students about it and they
said things like, You have such a cool internship.
Kris Stewart, founder
and director of the EdCenter, said the REU program gives her program
the resources to try new things.
"The REU experience
is an excellent opportunity for our students to explore interesting
areas that are new to them, and to interact closely with leading
professionals in the field in a collaborative manner whenever
possible," said EdCenter staff scientist Jeff Sale. "This
goes well beyond what they might ordinarily get exposed to in
their other undergraduate coursework." CF, RG, PT