CHARMMing
Proteins
Legion of Computers
 |
Figure
1. Protein L
Left: Protein L unfolded. Right: Protein L folded according
to CHARMM computation. |
roteins
are synthesized within cells as strings of amino acid molecules
arrayed along a "backbone" of carbon atoms. But just as
soon as a complete protein is manufactured, it folds into a unique
three- dimensional shape, and that shape determines how the protein
will function. One of the leading problems in molecular biology
is to achieve an understanding of protein folding. An NPACI alpha
project is dedicated to exploring the how and why of protein folding
by distributing large-scale protein-folding simulations across the
Grid. The project recently reached a scientific and computational
milestone by simulating the folding of a protein called Protein
L using six different computers, some thousands of miles apart.
According to project leader Charles L. Brooks, III, "We compressed
a months worth of computing on our own systems into just 36
hours by using distributed computing." The NPACI alpha project, Protein Folding in
a Distributed Computing Environment, is led by Brooks, a professor
of molecular biology from The Scripps Research Institute (TSRI)
in La Jolla, and Andrew Grimshaw, professor of computer science
at the University of Virginia. Their collaborators include research
scientist Michael Crowley of TSRI and computer scientist Anand
Natrajan of Grimshaws group at Virginia. CHARMMing Proteins Crowley worked closely with the Virginia team
members to adapt the molecular dynamics program CHARMM for each
platform involved in the runs. CHARMM, originally developed in
the laboratory of Martin Karplus at Harvard University, is under
continual development by a consortium of scientists worldwide,
with oversight and coordination by Brooks, Karplus, and Bernard
R. Brooks (who is Chief of the Computational Biophysics Section
of NIH/NHLBI/LBC). CHARMM supplies a general simulation environment
for studies of the motions and mechanics of bio-macromolecular
systems. For the distributed computing test, CHARMM
was used to compute the folding free-energy landscape of Protein
L, a small (62 residues) protein with 585 atoms. The available
processors for each run were divided into "gangs" of
16 processors that performed tightly coupled parallel calculations,
with each gang exploring approximately 200 distinct regions of
conformational space. The loose folds represented in each region
proceeded toward the native folded state during the computation.
"We are still analyzing the calculations,"
said Charles Brooks, "and we hope to confirm experimental
observations made by other groups that reveal a very specific
order of folding for Protein L." The way in which Protein
L folds differs significantly from the way in which a very similar
protein, called Protein G, is known to fold. Both proteins contain
a large "alpha helix" structure laid upon several "beta
strands" (relatively flat ribbons), and their amino acid
sequences are nearly identical. "The fact that the sites
of nucleation or condensation differ may indicate the importance
of very small differences in the sequence. If so, we will be closer
to a deepened understanding of the protein folding problem,"
Brooks said. Legion of Computers The key to the recent successful simulations,
according to Brooks, was the use of Legion, a grid operating system
developed by Grimshaw and colleagues with funding from NPACI and
various agencies. "With Legion," said Grimshaw, "all
the scientist needs are compiled codes for each platform that
may be used, a script for dispatching jobs, and another for keeping
track of the results. Legion does everything else, either through
an easy-to-use Web interface or a more traditional Unix command-line
interface." Grimshaw explained that Legion manages queues,
accounting, security, job submission, recovery from errors of
all kinds, status reporting, and the job of returning the output
to the scientist when the calculations are done. Any computing
system registered in the Legion network may participate in the
calculation if compiled code for it exists. Legion inventories
the available resources and schedules the job to take best advantage
of them. The NPACI systems used in the project included
Blue Horizon, a 1.7 teraflops IBM SP at SDSC, a 32-processor Sun
Enterprise 10000 at SDSC, a 128-processor HP V2500 at Caltech,
a 32-processor Centurion Alpha cluster at Virginia, and IBM systems
at the universities of Michigan and Texas totaling 56 processors.
"There were no Legion run-time failures," said Anand.
"We plan to add functionality in the form of archival or
mass storage systems." He credited work done by NPACI computer
scientists led by Nancy Wilkins-Diehr for the successful runs.
MM 
|
Project
Leaders
Charles L. Brooks III
TSRI
Andrew Grimshaw
University of Virginia
Participants
Michael Crowley
TSRI
Anand Natrajan
University of Virginia |