Gribskov & Smith

 BIMM 141 Laboratory

Spring 2001

Introduction to Bioinformatics


 

Exercise 9

 

Protein Structure Prediction

and 3DVisualization


The objectives of this exercise are to become more familiar with protein sequence analysis, particulary structure prediction and motif or pattern searching, and with protein 3D visualization using Web tools.

The classic Lambda Cro protein dimer binding via Helix-Coil-Helix to Lambda Operator ... from 3cro.pdb


A beta-beta sandwich protein domain, the classic Ig fold


Dictyostelium cAMP-dependent Protein Kinase, catalytic subunit
modeled! by Swiss-Model

All of the above are visualized using RasMol




BIMM 140: | Main | 140_Info | Syllabus | Lectures | Exams | DNASYSTEM | CMS MBR |
BIMM 141: | Main | 141_Info | Syllabus | Exercises | DNASYSTEM | CMS MBR |



Specific Tasks to Perform in Exercise 9:

A. Secondary Structure Prediction
1. Use PEPTIDESTRUCTURE to predict secondary structure by the Chou-Fasman method.
2. Use PEPPLOT to predict secondary structure by the GOR method.
3. Compare and evaluate predictions.
B. Transmembrane Sequences
1. Use web services to predict transmembrane sequences.
2. Compare prediction to Kyte-Doolittle style hydrophobicity plot.
3. Use PSORT and Signalp to predict signal peptides.
C. Homology Modeling
1. Use Swiss-model to get a homology based structure prediction
D. Protein 3D Structure Visualization
1. Use RasMol on your Sun Solaris Computer
a. Activate RasMol
b. RasMole Documentation and Tutorials
c. Use RasMol on a PDB structure or two
2. Databases of annotated 3D images
a. Use of the SWISS-3DIMAGE database of 3D images
b. Use of the SCOP database of 3D images
E. Questions

As you perform the Exercise, add comments to your Lab Notebook and answer relevant Questions at the end.

 

{A. Secondary Structure Prediction }

For this part of the exercise, choose a protein with a known three dimensional structure. Proteins with known structures can be found in PDB (http://www.rcsb.org) and then looked up in Entrez.

Alternatively, choose one of your favorite proteins in SwissProt at ExPASy, choosing one which has a PDB entry in the SwissProt documentation.

{1. Use GCG PEPTIDESTRUCTURE and PLOTSTRUCTURE to predict secondary structure by the Chou-Fasman method. }

Note that PEPTIDESTRUCTURE modifies the Chou-Fasman algorithm. Read the documentation to understand how.

Use the output file from PEPTIDESTRUCTURE as input file to PLOTSTRUCTURE to obtain a graphics image of your results.

{2. Use GCG PEPPLOT to predict secondary structure by the GOR method. }

Note that the GOR prediction is only written out to a file when you check the "File for Garnier predictions" box in the optional parameters. Note the other text files that can also be obtained.

{3. Compare and evaluate the predictions made by the above methods to each other and to the actual structure. }


 

{B. Transmembrane Sequences}

For this part of the exercise, choose a known transmembrane sequence that has several annotated transmembrane regions. This can be done by searching Entrez for "transmembrane region" and using the Netscape find function to search for "sp|". SWISS-PROT entries will have putative transmembrane regions annotated. Also identify a protein sequence containing a signal peptide by searching for "signal peptide precursor". Avoid hypothetical and putative sequences. There are a number of services that predict transmembrane regions including:

PredictProtein (http://www.embl-heidelberg.de/predictprotein/predictprotein.html) ... email

TMPred (http://www.microbiology.adelaide.edu.au:80/learn/tmpred.html) ... on-line, fast

TMAP (http://www.mbb.ki.se/tmap/index.html) ... email, requires multiple seq alignment

DAS (http://www.sbc.su.se/~miklos/DAS/) ... on-line, fast

SOSUI (http://sosui.proteome.bio.tuat.ac.jp/sosuiframe0.html) ... on-line, fast

HMMTOP (http://www.enzim.hu/~tusi/hmmtop/) ... fast, on-line

PSORT Transmembrane and signal peptide (http://psort.nibb.ac.jp/index.html) ... on-line, fast

SignalP Signal sequence (http://www.cbs.dtu.dk:80/services/SignalP/) ... on-line, medium

{1. Use one or more of the web services to predict transmembrane sequences. }

Use these web accessible methods to predict the locations of transmembrane sequences. Note that outside and inside refer to the side of the lipid bilayer. TMap will also provide predictions for homologs of the query sequence (When I - MG - ran it TMap took about 15 minutes to finish).

{2. Compare prediction to Kyte-Doolittle style hydrophobicity plot using the GCG program PEPPLOT. }

The hydrophobicity plot is panel j. What regions would you predict as transmembrane based on this plot? Are they different than the predictions above? Which prediction agrees best with the documentation in the sequence entry?

{3. Use PSORT and SignalP to predict signal peptides. }

PSORT also predicts transmembrane regions. Does the prediction agree with the one above? Is the prediction of signal peptide the same with PSORT and SignalP?


 

{C. Homology Modeling }

Not many resources are yet available on the net. The main ones are the PredictProtein (http://www.embl-heidelberg.de/predictprotein/) server, also known as PhD, and the Swiss-Model (http://www.expasy.org/swissmod/SWISS-MODEL.html) server.

It is OK to use a sequence that you don't know the structure of, but it must be a Homologue of a protein whose structure IS known.

You should be familiar enough by now to be able to easily submit a query, although making sense of the output is a little more difficult. Both of these services return their results by email.


{1. Use Swiss-model to get a homology based structure prediction }

Select "Normal" mode to get a more complete output. How realistic do you think this prediction is?

The following are examples of files produced by Swiss-Model:

Swiss-Model BLAST search for homologs

Swiss-Model FASTA search for model structure

Swiss-Model multiple alignment

Swiss-Model Preliminary 3D model

Swiss-Model Final 3D model
 
 

 

{D. Protein 3D Structure Visualization}

Many tools are available on the Web for visualization of Protein Structure in 3-dimensions, and many of these work on microcomputers, probably more than for Unix computers. RasMol is the product of Roger Sayle of GlaxoWellcome, who has made the product free for 'educational purposes' (read: PR for Glaxo...). RasMol does the 'nuts and bolts' of handling a protein image: translation, rotation, slab cutting, plus visualization in a number of ways such as stick and knobs, ribbons, space filling, and stereo, with abundant use of color to emphasize different features of the protein. Residues can be labeled. RasMol is NOT an analysis tool, but as a visualization tool, it is an excellent, rapid tool.

The Free Molecular Visualization Software page at U Mass, Amherst, provides links to learn about and download many other 3D Viewers, including Chime, Swiss-PdbViewer, Cn3D, WebLab, MAGE, MolView, and LinusLite, as well as other relevant links.

WebLab ViewerLite from MSI, for PCs and Macs, is more powerful and robust than RasMol, but is 'RAM hungry' and runs very slowly on Macs having only 32 Mbytes RAM. InsightII, used in Exercise 10, is also from MSI.

MAGE, used for Kinemage images, has been used extensively for educational purposes, including routine Kinemage images with all publications of the journal 'Protein Science' and a Kinemage disk to be used with the Brandon and Tooze 'Introduction to Protein Structure' textbook.

 

{1. Use RasMol on your Sun Solaris Computer.}

RasMol is implemented on computers as a separate, standalone application, and has been implemented on the Sun Solaris computers in 4306 York Hall by the ACS. Although RasMol can be used as a Netscape plugin unit, the implementation on these Sun computers is only as a standalone application.

We will use RasMol to viualize Protein 3D structures in several different ways.

Information on the ACS implementation of RasMol is available at the Web site:

       http://www-acs.ucsd.edu/offerings/userhelp/HTML/rasmol,d.html

You can find this site by going to the UCSD ACS Web site, and using the "Search" facility to search for "Rasmol"

 

{a. Activate RasMol on your Sun computer}

Open a Console or Terminal window and at the % prompt, do: prep rasmol

 

{b. RasMol Documentation and Tutorials}

Several Tutorials on RasMol are available from the University of Massachusetts RasMol Web site.

This Web page also has several PDB protein structure files suitable for visualization using RasMol, specifically those used in one of the tutorials, the "Coulson Tutorial".

Here we will use only the superficial elements of RasMol, those used via pulldown Menus and the Mouse.
The more powerful and detailed elements are accessible only from Commands delivered to RasMol on a command line (select and hide atoms, residues, chains, ligands, groups of residues, etc); the Tutorials are useful for learning these commands.

For additional information, one can consult the RasMol Manual and a Univ. Mass. FAQ page.

Versions of RasMol are available for Mac and PC computers just by downloading. Major sites are the RasMol Home Page at U Mass in Amherst and the 3D Viewer Help Page at Sacch3D, Stanford Univ. Links to sites for downloading other 3D Viewers are also provided at the Sacch3D site. These sites also have documentation available, as well as examples of use of RasMol, user groups, bibliography, etc.

 

{c. Use RasMol on a PDB structure or two.}

{Obtain at least two *.pdb structure files for use with RasMol}

There are RasMol example files available on the Univ Mass RasMol tutorial Web page.

You can also obtain *.pdb structure files using keywords from PDB using a keyword search approach.
The name of the PDB file can also be learned from protein entry annotation at SwissProt; links are also provided at SwissProt to the PDB data file.

 

{Open a *.pdb file in RasMol and see what RasMol can do}

The simplest way to open a *.pdb file in RasMol is to open a Console or Terminal window on your Unix computer and do the following at the % prompt: rasmol <filename>.pdb

Alternatively, you can do at the % prompt: rasmol
Then choose open from the File pulldown menu in Rasmol, and then enter the filename <file>.pdb in response to the Rasmol prompt in the Terminal or Console window.

You can learn about RasMol in two ways:

  1. Try options in the Pulldown Menus on the Unix computer and see what they do.
    Move the mouse around on the RasMol screen and see what happens.
  2. Read some of the documentation in the Tutorials or in the RasMol Manual.
    These documents provide information on how you can operate RasMol from the RasMol Command Line, e.g. to 'translate' the molecule on the screen, do 'fine adjustment' rotation, etc.

Excellent documentation written by Roger Sayle, the creator of RasMol, is available under RasMol V2.5: Molecular Visualisation Program at Cambridge, England.

 

{Try RasMol on the file 3cro.pdb that came with the RasMol download. Try to reproduce the image shown at the top of this Exercise.}

This file can be obtained from PDB by searching for the keywords "lambda cro".
Alternatively, it can be copied into your account from an ACS directory on insci14 via execution of the following command at the Unix % prompt:

     cp /software/nonrdist/rasmol/RasMol2/data/3cro.pdb .

Note the period at the end of this Unix command ... very important.

 

{2. Databases of annotated 3D images.}

There are several Web Sites which present databases of Protein 3D images which can be visualized by RasMol and other 3D Viewers. The SWISS-3DIMAGE database is one of the major ones. Others include the NIH Center for Molecular Modeling; the NIH Molecules R Us facility; SCOP (Structural Classification of Proteins) at MRC, Cambridge; CATH (Class-Architecture-Topology-Homology) at Univ London, England; and The Protein Science Kinemage Index of PDB structures, imaged using the MAGE 3D viewer. And this is by NO MEANS a complete list ...

Here we use the SWISS-3DIMAGE and SCOP databases as examples of such sites.
 
 

{a. Use of the SWISS-3DIMAGE database of 3D images.}

Go to the Swiss SWISS-3DIMAGE Home Page at ExPASy and spend some time reading and examining the information.

Access the SWISS-3DIMAGE database and bring up the entry of your favorite Protein whose structure is known and whose coordinates are in PDB. Note that a typical SwissProt database entry style entry comes up. Click on the '3D_IMAGE' link to visualize the Protein 3D structure.

Click on the 'RasMol' link. This will permit you to save the Protein PDB info for visualization in RasMol. Save this as a file in your account. Turn on RasMol and bring up this file.

Try to manipulate the figure so as to reproduce the figure brought up in NetScape from the 'GIF' or 'JPEG' links below.

Note the additional Images links present in the SWISS-3DIMAGE entry. See what some of these look like by clicking on the 'GIF'' or 'JPEG' links. Which come up faster, the 'GIF'' or 'JPEG' links?
 
 

{b. Use of the SCOP database of 3D images.}

SCOP (Structural Classification Of Proteins), a product of the Chothia group at Cambridge, England, provides visual and text information about structural elements with a given Protein and within classes and families of Proteins. The SCOP database also provides reverse information: all Proteins that have specific Structural elements.
 
 
{Learn about the SCOP database and facility}

Go to the SCOP database and click on the 'search' link at the bottom or on the Help icon at the top (the one with a ?); either link goes to the SCOP Help page. Read some of this page to learn about SCOP and what can be done at this facility.

 

{Maneuver up and down through the SCOP heirarchy.}

Use the Access method of "Keyword search of SCOP entries" to find your favorite Protein whose structure is known

Note the Icons at the top of the SCOP page. Click on some of these to see what they do. These permit you to move up and down through the SCOP heirarchy of objects: Protein <--> Family <--> Superfamily. Get a feel for this heirarchy. Click on the Chime Icons of Objects of interest to visualize their structures.

Note the Lineage Objects and the PDB Entry Domains. Examine some of these to see what they are.

Note the link to the 'RasMol graphics interface'; this Icon looks like: 
This permits downloading of a *.pdb file for visualization with RasMol


 

{E. Questions:}

Answer questions given above as well as the following:

  1. Give at least two reasons why the molecular weight calculated for a protein based on its sequence may be wrong. How might your reasons be related to protein function?
  2. How might post-translational modification events affect protein secondary structure?
  3. How might such modification events be important to "proteomics"?
  4. What is the main feature in "normal" globular proteins that could be confused with a transmembrane region when using the Kyte-Doolittle method? What distinguishes these regions from "true" transmembrane regions?
  5. Are all transmembrane sequences extremely hydrophobic? Why or why not? What is an amphipathic protein?
  6. What are the main alpha helix forming and breaking residues in the Chou-Fasman secondary structure prediction method?
  7. What are the main beta sheet forming and breaking residues in the Chou-Fasman secondary structure prediction method?
  8. Why is a proline a breaking residue for both alpha and beta structures?
  9. What are the preferred residues at the central two positions of a turn? Are turns simply hydrophilic regions ? Explain, basing your answer on either the Chou-Fasman or GOR prediction method.
  10. Describe at least two potential problems with the Profile approach to describing sequence motifs. What are the advantages of a position-specific weight matrix approach as compared to a regular expression appraoch?
  11. What does the consensus sequence shown at the left-hand side of the profile represent? Would you expect to get equivalent (to PROFILESEARCH) results if you simply used the consensus sequence in a BLAST search?
  12. What is the most difficult part of the unsupervised learning approach used by MEME? Finding the description of the motif, finding the locations of the motif, or determining the width of the motif. Why?
  13. What are the major Display options found in RasMol? Why might one choose each of these options?
  14. What do each of the Color options distinguish in their color patterns in RasMol?
  15. What do each of the Options pulldown menu choices do in RasMol?
  16. What Display and Color options were used to obtain the Cro Operator graphic at the top of this Exercise 9?
  17. What are the basic steps used in Homology Modeling, for example, as done by Swiss-Model?
  18. How do the Swiss-3DImages compare with those in RasMol?
  19. What are the "Lineage" objects in SCOP?
  20. What are the two primary visualization tools used in SCOP?

 

 


BIMM 140: | Main | 140_Info | Syllabus | Lectures | Exams | DNASYSTEM | CMS MBR |
BIMM 141: | Main | 141_Info | Syllabus | Exercises | DNASYSTEM | CMS MBR |



Latest modification: Spring, 2001

If you have problems or questions, send email to Michael Gribskov or Doug Smith or Hiren Patel