Skip to content


    Skip to content

     

    ALPHA PROJECTS | Contents | Next

    Protein Folding in a Distributed Computing Environment

    PROJECT LEADERS
    Charles L. Brooks III
    The Scripps Research Institute (TSRI)

    Andrew Grimshaw
    University of Virginia,
    NPACI Metasystems thrust area leader

    PROJECT MANAGER
    Andrew Grimshaw
    University of Virginia,

    PARTICIPANTS
    David Case, Michael Crowley, TSRI
    Bernard Pailthorpe, John Moreland, Nicole Bordes, SDSC
    John Karpovich, Katherine Holcomb, University of Virginia

    Proteins have been called the "building blocks of life." Unlike bricks or concrete slabs, however, proteins are complex, three-dimensional molecular bundles. Their relationships determine the progress of growth in all organisms, constitute the immune system, and are vital in health and disease. How these bundles form and what makes one substructure differ from another are important questions for molecular biologists and medical scientists. Now an NPACI alpha project led by computational biologist Charles Brooks at The Scripps Research Institute is poised to make an all-out assault on the question of protein structure formation.

    Other project participants include staff researcher Michael Crowley of the Department of Molecular Biology at The Scripps Research Institute (TSRI); Andrew Grimshaw, director of the Institute for Parallel Computation, and staff scientist Katherine Holcomb, both in the Computer Science Department at the University of Virginia; and Bernard Pailthorpe, SDSC associate director for Scientific Visualization.

    DIRECTIONS AND GOALS

    THE FOLDING FUNNEL

    EFFECTS OF SECONDARY STRUCTURE

    BECOMING LEGION


    The Folding Funnel
    Figure 1. The Folding Funnel
    Shown with structures of a small */ß protein, the folding landscape may be depicted as a funnel. Brooks's simulations begin with a database of conformations throughout the folding region.

    DIRECTIONS AND GOALS

    The alpha project's complexity is warranted by advanced computational developments of recent years, and it will be a test both of the infrastructure and the science that can be accomplished in no other way than by the use of widely distributed elements of that infrastructure. It is consonant with recent reports and initiatives in computational biology put forward by NSF, NIH, DOE, USDA, and other agencies.

    "We will be using the molecular dynamics programs CHARMM and AMBER, and we'll distribute our problem across many of the NPACI advanced computational resources and experimental clusters, using the Legion metacomputing system developed at Virginia," Brooks said. "We believe that an ambitious problem--an accurate calculation of the folding landscape of a protein containing about 100 amino acids--will help to refine recent theoretical advances and perhaps to extend them."

    "We want to establish a paradigm for exploring key protein-folding questions," he said. "One very practical application is to take low-resolution structure predictions derived from structural genomics and to refine these structures to a biochemically useful resolution."

    Top | Contents | Next

    REFERENCES

    Brooks, C.L. III, M. Gruebele, J.N. Onuchic, and P.G. Wolynes (1998): Chemical physics of protein folding, Proceedings of the National Academy of Sciences 95, 11037-11038.

    Bursulaya, B.D., and C.L. Brooks, III (1999): The folding free energy surface of a three-stranded ß-sheet protein, Journal of the American Chemical Society 121, no. 44 (in press).

    Figure 2. Secondary Structure
    Brooks has done simulations of proteins containing (top) only * helices, (middle) an * helix and ß sheets, and (bottom) only ß sheets. The corresponding diagrams show the folding landscapes in terms of the radius of gyration (in angstroms, vertical axis) versus the fraction of native contacts (horizontal axis). The all-* protein finds its native structure straightforwardly; the others "collapse" in two stages, as shown by the L-shaped landscapes.
    Secondary Structure

    THE CONCEPT OF PROTEIN FOLDING

    As they are formed from RNA templates, proteins are defined as long polypeptide chains with specific amino acid sequences that fold into three-dimensional bundles whose structure governs their function. In living organisms, the specific steps of the folding process have been hard to discern experimentally and characterize theoretically. It seems that all the information needed to get to a precise three-dimensional shape is "in there already," contained in the one-dimensional amino acid sequence. In the past few years, molecular biologists (with many other scientists, including physicists and computer scientists) have looked much more deeply into the question of protein folding, asking, "How do they do that?"

    Better answers to this question, Brooks said, are coming from new theoretical perspectives that can be tested and refined by computer simulations and compared with laboratory experiments. "There are really two components of the protein-folding problem," he said. "One I call the 'engineering problem.' The sequence exists--now what is the native structure? There are many different approaches to this problem." Many use a minimalist model or protein description, envisioning the protein as a series of beads occupying lattice sites or embedded in a continuum, finding a structure via global optimization algorithms. "One might say that, in these methodologies, the end point is more important than the process, a kind of brute-force strategy," Brooks said. Nevertheless, such methods are advancing protein structure prediction.

    "The second protein-folding problem is the one I find most intriguing," he said. "How, exactly, do real proteins fold? Does a given amino acid sequence follow a well-defined path or a multitude of paths to a final state? Can that path be modified? The aberrant shapes of prions or of proteins that play a role in the formation of Alzheimer's plaques in the brain--can ways be found to inhibit their transition from native structures or to drive them away from nonproductive or disease-forming states?" Attention to the chemical physics of folding, he noted, has produced a general framework focusing on the "protein-folding energy landscape," which he will be exploring in the course of the alpha project.

    Top | Contents | Next

    THE FOLDING FUNNEL

    The physics of protein folding has been elaborated recently by a number of investigators, notably evolving from the early work of Peter G. Wolynes of the School of Chemical Sciences at the University of Illinois and follow-on simulations of José N. Onuchic of the Physics Department at UC San Diego. Working with them and with experimentalist Martin Gruebele, also from Illinois, Brooks has used his molecular dynamics studies of small proteins to illuminate the new aspects of the physics involved.

    "A full understanding of the folding process requires a global view of the energy landscape on which proteins fold," Brooks said. The vast conformational space available to the unfolded protein chain rules out random searching to find the correct folded structure. Yet proteins fold on timescales ranging from less than a second to a few minutes, so they obviously drive or are driven rather quickly toward the native state. "Theoretically, folding can be described as the descent of the folding chain down a 'folding funnel,' with local roughness of the funnel reflecting the potential for transient trapping in local minima and the overall slope of the funnel representing the thermodynamic drive to the native state," Brooks said (Figure 1). "A key notion is that, in all but the final stages of folding, there exists an ensemble of structures--protein folding consequently occurs via multiple pathways."

    Top | Contents | Next

    EFFECTS OF SECONDARY STRUCTURE

    Brooks and colleagues have performed numerous all-atom molecular dynamics simulations of small proteins in explicit solvent to map out the folding free energy landscape along various coordinates. These include the radius of gyration (a measure of the compactness of the molecule) and the fraction of native "contacts" (the coalescence of residues distant in sequence, a measure of the final, folded-up "tertiary" structure).

    The way in which different proteins fold may be related to the predominance of one or another kind of secondary structure in the protein. The main elements of secondary structure are the * helix, in which the polypeptide backbone chain of the protein traces a right-handed helical path, and the ß sheet, which consists of two or more straight extended polypeptide chains adjacent to one another(Figure 2).

    "The topology of the protein native state definitely appears to influence the folding mechanism," Brooks said. In all-* proteins, the formation of the tertiary (native) structure occurs concurrently with the formation of secondary structure. In a mixed */ß protein, a general collapse (reduction in radius of gyration) occurs first, followed by evolution toward the native state.

    Most recently, working with postdoctoral colleague Badry D. Bursulaya, Brooks has calculated the folding free-energy surface of a designed, three-stranded ß sheet protein called Betanova. The calculations were performed on 64 processors of the Cray T3E at SDSC, which took about one hour to calculate the molecular dynamics for 60 picoseconds. Brooks and Bursulaya investigated about 75 initial states, simulating each for 400 picoseconds. "Here we found a distinct two-stage collapse," Brooks said, like the mixed */ß system. "Water was present in the protein core until late in the simulations, and its expulsion from the interior coincided with the formation of secondary structure."

    Top | Contents | Next

    BECOMING LEGION

    The Brooks group has long been aware of the implications of their studies in terms of computer usage, and they began working on ways to use their codes on multiple parallel systems in the late 1980s. "Now we've been helping them take advantage of the Legion metacomputing system," said Andrew Grimshaw. He and Holcomb have worked with Brooks and Crowley to "Legionize" the CHARMM and AMBER codes. "We've run CHARMM simultaneously on an SGI cluster and our Centurion cluster here at Virginia," Grimshaw said. "We expect to run it similarly on all available parallel platforms within the Legion testbed." Holcomb has also developed a graphical user interface for the Legion/CHARMM system, which was demonstrated at the Fall Internet2 meeting in Seattle in October and which may be seen at the Legion booth at SC99 in Portland, Oregon, in November.

    "Ultimately, the objective is to be able to predict protein structure directly from the amino acid sequence as much as possible," Brooks said. "Our alpha project is designed to help us get there while observing first principles." --MM *

    Top | Contents | Next