Pharm 207/Bio 207 Home Page 

Using Internet Resources in Molecular Biology - Lecture 9

Protein Structure Prediction


 

Lecturer: Phil Bourne
Last Update: November 19, 2002

Table of Contents

Lecture Outline

  • What is protein structure prediction
  • Brief description of methods
    • Comparative (homology) modeling 
    • Fold recognition (threading)
    • Molecular mechanics (ab initio methods)
  • Performing a prediction using a threading technique


| TOC & Lecture Outline | Introduction | Reading Materials | Lecture Goals and Assignment | Examples |


Introduction

Protein structure prediction, also referred to as the protein folding problem, is the prediction of the tertiary structure of a protein from its amino acid sequence. This is highly desirable given (a) that the structure of a protein tells us a great deal about the function of that biological macromolecule (b) experimental techniques (predominantly X-ray crystallography) are time consuming and expensive relative to the determination of primary sequence (c) many proteins (e.g. membrane proteins) do not lend themselves to structure solution.

The underlying assumption in making such a prediction is that the structure can indeed be inferred by the sequence alone. Denaturation and naturation studies would tend to support this notion although there is evidence of chaparones assisting the folding process in some cases.

The following figure provides a general overview of protein structure prediction. Numbers in square brackets refer to chapters in reference 1 in "Reading Materials".

Consider very briefly the methods embodied in the above figure.

Ab Initio methods

These methods rely on the fact that the folded protein is in a state of lowest free energy and they attempt to compute that lowest energy conformation based on possible interactions between the residues comprising the sequence. This is a very compute-intensive problem and the assumptions that are made preclude accurate prediction. However, in principle these could be the most successful methods since, unlike threading and homology modeling they are not necessarily biased by existing structural information (this could be viewed a bad thing if the real structure turns out to be highly homologous, or a bad thing if it is an example of a new fold). These methods are composed of:

Many different types of empirical force field and levels of structure description are in use.

Comparative Modeling

Used when there is a clear relationship between the sequence of the unknown and sequence(s) for which known structures exist. In this way an approximate correct fold is assured. The problem then becomes one of fine tuning. That is, small shifts in the main chain (1-2A) orientation of the side chains and conformations of loop regions.

Fold Identification (Threading)

There are many cases where folds are the same yet sequences are very different - if not there would not be much fun in the protein folding problem. The approach is to define a database of core known folds and then attempt to fit (thread) the sequence to that database evaluating the suitability of each possible fit. In one sense this reduces an infinite problem (ab initio) to a one-in-a-thousand problem since one could argue that there are approximately 1000 known folds (this number itself is subject to much argument). Assuming a fold can be recognized in this way the problem then reduces to the same issues concerning comparative modeling.

What is the Bottom Line?

Just how well do each of these methods do? A recent answer to this question can be found from CASP4, the Third Meeting on the Critical Assessment of Techniques for Protein Structure Prediction held at the Asilomar Conference Center, December 13-17, 1998.

In short, progress is being made, and for specific cases useful information (from a biological perspective) might be ascertained, but there is still a long way to go before any combination of procedures can be used with some degree of certainty which can be measured.


| TOC & Lecture Outline | Introduction | Reading Materials | Lecture Goals and Assignment | Examples |


Reading Materials

  1. Required CASP and CAFASAP Experiments and their findings P.E.Bourne (2002) Structural Bioinformatics Eds. Bourne and Weissig. Wiley NY [pdf]
  2. (Reference)Protein structure prediction: a practical approach, ed. Sternberg M.J.E., IRL Press at Oxford University Press, 1996, 298p.

 

Software and information resources:


    | TOC & Lecture Outline | Introduction | Reading Materials | Lecture Goals and Assignment | Examples |


    Examples

    General scenario:

  1. Select new structures which do not have obvious sequence homology with known structures (try members of the above list and the FASTA option of the PDB).
  2. Extract sequence using the PDB and predict fold using Fold recognition server at Ben Gurion University. or 123D fold recognition server.
  3. Other servers are available at the CAFASP site.
  4. Analyse prediction by evaluating how well suggested alignment to matching structure describe query fold using Alignmnent-to-superposition converter and Structure alignment by CE.
  5.  

    Step-by-step example:

    Note: actual results for this example may differ from that described below due to update of the databases and training set on prediction server.

    1. Selecting test structure without obvious homologs.

Using the PDB we test structures from the list of new structures by searching for them by sequence homology.

Select a new structure from the PDB Web site http://www.rcsb.org/pdb from the "Last Update" link

Go to "Sequence Details" page

Determine the chain letter you are going to search for.

Go to the "SearchFields" Interface

Select "FASTA Search" and regenerate the form