![]() |
Pharm 207/Bio 207 Home PageUsing Internet Resources in Molecular Biology - Lecture 9Protein Structure Prediction |
| Lecturer: Phil Bourne Last Update: November 19, 2002 Table of Contents
|
Pharm 207/Bio 207 |
Lecture Outline
|
| TOC & Lecture Outline | Introduction | Reading Materials | Lecture Goals and Assignment | Examples |
Protein structure prediction, also referred to as the protein folding problem, is the prediction of the tertiary structure of a protein from its amino acid sequence. This is highly desirable given (a) that the structure of a protein tells us a great deal about the function of that biological macromolecule (b) experimental techniques (predominantly X-ray crystallography) are time consuming and expensive relative to the determination of primary sequence (c) many proteins (e.g. membrane proteins) do not lend themselves to structure solution.
The underlying assumption in making such a prediction is that the structure can indeed be inferred by the sequence alone. Denaturation and naturation studies would tend to support this notion although there is evidence of chaparones assisting the folding process in some cases.
The following figure provides a general overview of protein structure prediction. Numbers in square brackets refer to chapters in reference 1 in "Reading Materials".
Consider very briefly the methods embodied in the above figure.
These methods rely on the fact that the folded protein is in a state of lowest free energy and they attempt to compute that lowest energy conformation based on possible interactions between the residues comprising the sequence. This is a very compute-intensive problem and the assumptions that are made preclude accurate prediction. However, in principle these could be the most successful methods since, unlike threading and homology modeling they are not necessarily biased by existing structural information (this could be viewed a bad thing if the real structure turns out to be highly homologous, or a bad thing if it is an example of a new fold). These methods are composed of:
Many different types of empirical force field and levels of structure description are in use.
Used when there is a clear relationship between the sequence of the unknown and sequence(s) for which known structures exist. In this way an approximate correct fold is assured. The problem then becomes one of fine tuning. That is, small shifts in the main chain (1-2A) orientation of the side chains and conformations of loop regions.
There are many cases where folds are the same yet sequences are very different - if not there would not be much fun in the protein folding problem. The approach is to define a database of core known folds and then attempt to fit (thread) the sequence to that database evaluating the suitability of each possible fit. In one sense this reduces an infinite problem (ab initio) to a one-in-a-thousand problem since one could argue that there are approximately 1000 known folds (this number itself is subject to much argument). Assuming a fold can be recognized in this way the problem then reduces to the same issues concerning comparative modeling.
Just how well do each of these methods do? A recent answer to this question can be found from CASP4, the Third Meeting on the Critical Assessment of Techniques for Protein Structure Prediction held at the Asilomar Conference Center, December 13-17, 1998.
In short, progress is being made, and for specific cases useful information (from a biological perspective) might be ascertained, but there is still a long way to go before any combination of procedures can be used with some degree of certainty which can be measured.
| TOC & Lecture Outline | Introduction | Reading Materials | Lecture Goals and Assignment | Examples |
| TOC & Lecture Outline | Introduction | Reading Materials | Lecture Goals and Assignment | Examples |
To learn how to use protein structure prediction servers.
List of newest protein structures:
1jx7a
| TOC & Lecture Outline | Introduction | Reading Materials | Lecture Goals and Assignment | Examples |
Note: actual results for this example may differ from that described below due to update of the databases and training set on prediction server.
1. Selecting test structure without obvious homologs.
Using the PDB we test structures from the list of new structures by searching for them by sequence homology.
Select a new structure from the PDB Web site http://www.rcsb.org/pdb from the "Last Update" link
Go to "Sequence Details" page
Determine the chain letter you are going to search for.
Go to the "SearchFields" Interface
Select "FASTA Search" and regenerate the form
Select PDB option and enter the PDB id and chain identifier
Find homologs (or lack thereorf) based on e-values.
2. Searching query sequence.
For a sequence with no homologs and a sequence with homogs:
Query sequence:
TMITPSSGNS ASGVQVADEV CRIFYDMKVR KCSTPEEIKK RKKAVIFCLS ADKKCIIVEE GKEILVGDVG VTITDPFKHF VGMLPEKDCR YALYDASFET KESRKEELMF FLWAPELAPL KSKMIYASSK DAIKKKFQGI KHECQANGPE DLNRACIAEK LGGSLIVAFE GCPV
3. Analyzing results.
You will receive e-mail that includes:
Entity:
The method used for this prediction was: gonnet+predss . Most similar fold: 1cfya MOL_ID: 1; 2 MOLECULE: COFILIN; 3 CHAIN: A, B; 4 ENGINEERED: YES
Alignment:
bb hhhhhhhhh hhhhhhhhhhbbbbbb bbbbb b VQVADEVCRIFYDMKVRKCSTPEEIKKRKKAVIFCLSADKKCIIVEEGKE | |||| | | | | | | | | | | | VAVADESLTAFNDLK.........LGKKYKFILFGLNDAKTEIVVKE... bbhhhhhhhhhhhh hhh bbbbbbbb bbbbbbb bbbb hhhhh hhhhhhhhhh bbb ILVGDVGVTITDP.FKHFVGMLPEKDCRYALYDASFET..KESRKEELMF | ||| | ||| || || || | | | ........TSTDPSYDAFLEKLPENDCLYAIYDFEYEINGNEGKRSKIVF bb hhhhhhh bbbbbbbbbbb bbbbbbbb bbb bbbbb hhhhhhhh bbbbb hhhhhhhh FLWAPELAPLKSKMIYASSKDAIKKKFQGIKHECQANGPEDLNRACIAEK | | | || ||| ||||||| | | | FTWSPDTAPVRSKMVYASSKDALRRALNGVSTDVQGTDFSEVSYDSVLER bbbb hhhhhhhhhhhhhhhhhhh bbbbbbbb hhhhhhhhhhhhh
4. Comparison of threading result to direct structure superposition
For you pet protein perform a homology modeling and threading experiment and report on:
| TOC & Lecture Outline | Introduction | Reading Materials | Lecture Goals and Assignment | Examples |