Skip to content



Ironing out the Wrinkles of Protein Folding

Harold Scheraga, Cornell University
Jooyoung Lee, Adam Liwo, Jaroslaw Pillardy, Daniel Ripoll, Kenneth Gibson, Jeffrey Saunders, Cornell University

W ithin every living cell, tiny, somewhat mitten-shaped structures called ribosomes assemble proteins by stringing together long chains of building blocks called amino acids. The chains loop and intertwine, or fold, in many ways. However, only one way allows the protein to function properly. Protein misfolding can affect the progression of many apparently unrelated diseases such as Alzheimer's disease, cystic fibrosis, and blood-clotting problems. Although the code by which DNA instructs the cell to make proteins is well-known in the scientific community, it is not clear how the code in the chain of amino acids makes the protein fold correctly. Harold Scheraga and his team at the Chemistry and Chemical Biology Department at Cornell University are concerned with answering the question of how proteins fold into their native shapes: the protein-folding problem.

In living organisms, the specific steps of the folding process have been hard to discern experimentally and characterize theoretically. It seems that all the information needed to get to a precise 3-D shape is "in there already," contained in the chain of amino acids. For many years, Scheraga has been looking deeply into the question of protein folding, asking, "How do they do it?" Better answers to this question are soon to come from new theoretical perspectives that can be tested and refined by computer simulations and compared with laboratory experiments.

To determine how a protein folds into its native structure, it is necessary to look at the interactions between the atoms that constitute the protein. This can be approached from two sides, the experimental and the theoretical. The experimental work involves genetic engineering, as well as physical and chemical measurements of proteins. On the theoretical side, the Scheraga team studies various forms of chains of amino acids and shape changes in proteins by calculations of stable conformations of these substances using statistical and molecular mechanical methods.




Figure 1. Best Results of CASP3Figure 1. Best Results of CASP3
For CASP3, Harold Scheraga and his team of researchers from Cornell University studied the structures of several proteins, including one called HDEA, from E. coli bacteria. This figure shows stereoviews of superpositions of two calculated HDEA fragments (yellow) on the corresponding parts of the experimental structure (red). The parts of the chains for which the predicted structure matches the experimental one are shown as ribbons; misaligned parts are shown as thin lines. These were the best results for this protein of all the teams entered in CASP3 because the theoretical and experimental structures are so close to being aligned--it would take one rotation (top) to bring the two structures into coincidence.


"But there is also another question: how do we know we're doing it right?" said Scheraga, the George W. and Grace L. Todd professor emeritus of chemistry and chemical biology at Cornell. "The only way we can know is by blind test, and this is where CASP comes in." CASP is the Community Wide Experiment on the Critical Assessment of Techniques for Protein Structure Prediction, now in its fourth edition (CASP4).

The CASP experiments, held every two years, aim at establishing the current state of the art in protein structure prediction, identifying what progress has been made, and highlighting where future effort may be most productively focused. "CASP starts in April, so CASP4 has just begun," Scheraga said. "We're expecting to receive amino acid sequences from X-ray crystallographers via CASP. We apply our computational methodology and come up with a calculated set of coordinates of the atomic positions, which then is evaluated by the CASP judges."

During CASP3 in 1998, Scheraga's team worked all summer and submitted the coordinates by September 1, 1998 (Figure 1). They were evaluated by the CASP judges at a meeting in Asilomar, California, in December of the same year. "We have improved our computational methods, and are now involved in CASP4," Scheraga said. "During CASP3, parts of our structures were right, but some of the chains were not oriented properly with respect to each other. Any day now, we'll start receiving the amino acid sequences from the CASP4 people."

Top | Contents | Next


Much of the experimental work in the Scheraga lab involves the determination of the pathways of protein folding of the pancreatic enzyme ribonuclease A (Figure 2), and also the mechanism of action of thrombin on fibrinogen, in the blood-clotting process to form fibrin, the blood-coagulation end product.

"There are several pathways for folding any protein, and we're still working out the details," Scheraga said. "The chain folds up in different stages, and we use Nuclear Magnetic Resonance (NMR) to see the changes along the way." NMR is the selective absorption of electromagnetic radiation by an atomic nucleus in the presence of a strong, static, magnetic field, to provide interatomic distances that can be converted to a 3-D structure of the final folded protein.

According to what is known as the thermodynamic hypothesis of protein folding, the native structure corresponds to the lowest or global minimum free energy, as opposed to only a local minimum (kinetic hypothesis of protein folding). In their theoretical work, members of Scheraga's team calculate protein structure via global optimization using ab-initio methods. The goal is to find the structure with the lowest potential energy. "We don't use auxiliary aids, such as secondary structure prediction, homology modeling, or threading templates from the Protein Data Bank," Scheraga said. "We just require that the potential energy is the lowest possible--that is, the global minimum--on the potential energy surface. In this way, we optimize the potential energy. Therefore, the protein responds to energy, the way it does in nature."

Standard approaches for the study of interactions between protein atoms are primarily conceptualized in the form of pairwise interactions, that is, interactions within pairs of atoms.

One of the key interactions responsible for the specific patterns of protein folding is the hydrophobic interaction--the tendency for somewhat water-insoluble nonpolar molecules to cluster. In the past few years, it has been recognized that, for valid protein-structure prediction, it is necessary to incorporate the cooperative nature of these hydrophobic interactions, that is, to take account of the fact that the interactions are not pairwise but involve groups of three or more atoms--so-called multiple interactions. This can be accomplished by including multibody terms into the representation of the potential energy of the system. Because these hydrophobic interactions are so complex, supercomputer modeling of simple systems can prove of great benefit to understand their nature.

The results of simulations of the behavior of small hydrophobic molecules like methane in aqueous solution will potentially lead to a better understanding of hydrophobic interactions. This, in turn, will be a crucial step for solving the protein-folding problem. In addition, a better description of hydrophobic interactions can also lead to a greater understanding of how proteins (typically enzymes) react with other biological systems as a basis for drug design.

Top | Contents | Next


J. Lee, A. Liwo, D.R. Ripoll, J. Pillardy, J.A. Saunders, K.D. Gibson, and H.A. Scheraga. 2000. Hierarchical energy-based approach to protein-structure prediction: Blind-test evaluation with CASP3 targets. Int. J. Quant. Chem. 77: 90-117.

J. Lee, A. Liwo, D.R. Ripoll, J. Pillardy, and H.A. Scheraga. 1999. Calculation of protein conformation by global optimization of a potential energy function. Proteins--Struct. Func. and Genet. Suppl. 3: 204-208.

A. Liwo, J. Lee, D.R. Ripoll, J. Pillardy, and H.A. Scheraga. 1999. Protein structure prediction by global optimization of a potential energy function. Proc. Nat. Acad. Sci. 96: 5482-5485.

Figure 2. Ribbon Diagram of Bovine Pancreatic Ribonuclease A
Pancreatic ribonuclease A is a model protein for studying how disulfide bonds affect protein folding. The spherical beads are the four proline amino acid residues. Proline is an amino acid that occurs in essentially all proteins, and to a very large extent in collagen, the connective tissue in skin, bone, cartilage, tendons, and teeth. The ribbons indicate alpha-helical (spirals) and beta-structures (sheets), and the thin, dark-blue lines indicate the locations of the four disulfides.


Scheraga's work for CASP3 was done primarily on 64 processors of the IBM SP supercomputer at the Cornell Theory Center. Calculations were done for seven different proteins. The smallest one took one hour for 30 residues, or amino acids. For the largest protein, with 140 amino acids, it took five days on 64 processors. Related calculations of crystal structures, which help to refine the energy functions, were carried out at Cornell and also on the IBM SP at SDSC.

For CASP4, the Scheraga team will be using the Velocity cluster at Cornell, as well as Blue Horizon at SDSC, and its own PC-Linux cluster. The codes used on these machines were written by the Scheraga team.

To take bigger strides in solving the protein-folding problem, more research will need to be done, and Scheraga looks forward to continuing his work. "Ultimately, the objective is to be able to predict protein structure directly by ab initio calculations, that is, by global optimization of the potential energy. While there is still much work to be done, the progress so far is encouraging." --AV

Top | Contents | Next
Top | Contents | Next