| Gribskov & Smith |
BIMM 141 Laboratory |
Spring 2001 |
Introduction to Bioinformatics
The objectives of this exercise are to become more familiar with protein sequence analysis, particulary structure prediction and motif or pattern searching, and with protein 3D visualization using Web tools.

The classic Lambda Cro protein dimer binding via Helix-Coil-Helix to Lambda Operator ... from 3cro.pdb

A beta-beta sandwich protein domain, the classic Ig fold

Dictyostelium cAMP-dependent Protein Kinase,
catalytic subunit
modeled! by Swiss-Model
All of the above are visualized using RasMol
As you perform the Exercise, add comments to your Lab Notebook and answer relevant Questions at the end.
For this part of the exercise, choose a protein with a known three dimensional structure. Proteins with known structures can be found in PDB (http://www.rcsb.org) and then looked up in Entrez.
Alternatively, choose one of your favorite proteins in SwissProt at ExPASy, choosing one which has a PDB entry in the SwissProt documentation.
{1. Use GCG PEPTIDESTRUCTURE and PLOTSTRUCTURE to predict secondary structure by the Chou-Fasman method. }
Note that PEPTIDESTRUCTURE modifies the Chou-Fasman algorithm. Read the documentation to understand how.
Use the output file from PEPTIDESTRUCTURE as input file to PLOTSTRUCTURE to obtain a graphics image of your results.
{2. Use GCG PEPPLOT to predict secondary structure by the GOR method. }
Note that the GOR prediction is only written out to a file when you check the "File for Garnier predictions" box in the optional parameters. Note the other text files that can also be obtained.
{3. Compare and evaluate the predictions made by the above methods to each other and to the actual structure. }
For this part of the exercise, choose a known transmembrane sequence that has several annotated transmembrane regions. This can be done by searching Entrez for "transmembrane region" and using the Netscape find function to search for "sp|". SWISS-PROT entries will have putative transmembrane regions annotated. Also identify a protein sequence containing a signal peptide by searching for "signal peptide precursor". Avoid hypothetical and putative sequences. There are a number of services that predict transmembrane regions including:
PredictProtein (http://www.embl-heidelberg.de/predictprotein/predictprotein.html) ... email
TMPred (http://www.microbiology.adelaide.edu.au:80/learn/tmpred.html) ... on-line, fast
TMAP (http://www.mbb.ki.se/tmap/index.html) ... email, requires multiple seq alignment
DAS (http://www.sbc.su.se/~miklos/DAS/) ... on-line, fast
SOSUI (http://sosui.proteome.bio.tuat.ac.jp/sosuiframe0.html) ... on-line, fast
HMMTOP (http://www.enzim.hu/~tusi/hmmtop/) ... fast, on-line
PSORT Transmembrane and signal peptide (http://psort.nibb.ac.jp/index.html) ... on-line, fast
SignalP Signal sequence (http://www.cbs.dtu.dk:80/services/SignalP/) ... on-line, medium
{1. Use one or more of the web services to predict transmembrane sequences. }
Use these web accessible methods to predict the locations of transmembrane sequences. Note that outside and inside refer to the side of the lipid bilayer. TMap will also provide predictions for homologs of the query sequence (When I - MG - ran it TMap took about 15 minutes to finish).
{2. Compare prediction to Kyte-Doolittle style hydrophobicity plot using the GCG program PEPPLOT. }
The hydrophobicity plot is panel j. What regions would you predict as transmembrane based on this plot? Are they different than the predictions above? Which prediction agrees best with the documentation in the sequence entry?
{3. Use PSORT and SignalP to predict signal peptides. }
PSORT also predicts transmembrane regions. Does the prediction agree with the one above? Is the prediction of signal peptide the same with PSORT and SignalP?
Not many resources are yet available on the net. The main ones are the PredictProtein (http://www.embl-heidelberg.de/predictprotein/) server, also known as PhD, and the Swiss-Model (http://www.expasy.org/swissmod/SWISS-MODEL.html) server.
It is OK to use a sequence that you don't know the structure of, but it must be a Homologue of a protein whose structure IS known.
You should be familiar enough by now to be able to easily submit a query, although making sense of the output is a little more difficult. Both of these services return their results by email.
{1. Use Swiss-model to get a homology based
structure prediction }
Select "Normal" mode to get a more complete output. How realistic do you think this prediction is?
The following are examples of files produced by Swiss-Model:
Swiss-Model BLAST search for homologs
Swiss-Model FASTA search for model structure
Swiss-Model multiple alignment
Swiss-Model Preliminary 3D model
Swiss-Model Final 3D model
Many tools are available on the Web for visualization of Protein Structure in 3-dimensions, and many of these work on microcomputers, probably more than for Unix computers. RasMol is the product of Roger Sayle of GlaxoWellcome, who has made the product free for 'educational purposes' (read: PR for Glaxo...). RasMol does the 'nuts and bolts' of handling a protein image: translation, rotation, slab cutting, plus visualization in a number of ways such as stick and knobs, ribbons, space filling, and stereo, with abundant use of color to emphasize different features of the protein. Residues can be labeled. RasMol is NOT an analysis tool, but as a visualization tool, it is an excellent, rapid tool.
The Free Molecular Visualization Software page at U Mass, Amherst, provides links to learn about and download many other 3D Viewers, including Chime, Swiss-PdbViewer, Cn3D, WebLab, MAGE, MolView, and LinusLite, as well as other relevant links.
WebLab ViewerLite from MSI, for PCs and Macs, is more powerful and robust than RasMol, but is 'RAM hungry' and runs very slowly on Macs having only 32 Mbytes RAM. InsightII, used in Exercise 10, is also from MSI.
MAGE, used for Kinemage images, has been used extensively for educational purposes, including routine Kinemage images with all publications of the journal 'Protein Science' and a Kinemage disk to be used with the Brandon and Tooze 'Introduction to Protein Structure' textbook.
RasMol is implemented on computers as a separate, standalone application, and has been implemented on the Sun Solaris computers in 4306 York Hall by the ACS. Although RasMol can be used as a Netscape plugin unit, the implementation on these Sun computers is only as a standalone application.
We will use RasMol to viualize Protein 3D structures in several different ways.
Information on the ACS implementation of RasMol is available at the Web site:
http://www-acs.ucsd.edu/offerings/userhelp/HTML/rasmol,d.html
You can find this site by going to the UCSD ACS Web site, and using the "Search" facility to search for "Rasmol"
{a. Activate RasMol on your Sun computer}
Open a Console or Terminal window and at the % prompt, do: prep rasmol
{b. RasMol Documentation and Tutorials}
Several Tutorials on RasMol are available from the University of Massachusetts RasMol Web site.
This Web page also has several PDB protein structure files suitable for visualization using RasMol, specifically those used in one of the tutorials, the "Coulson Tutorial".
Here we will use only the superficial elements of RasMol, those
used via pulldown Menus and the Mouse.
The more powerful and detailed elements are accessible only from
Commands delivered to RasMol on a command line (select and hide
atoms, residues, chains, ligands, groups of residues, etc); the
Tutorials are useful for learning these commands.
For additional information, one can consult the RasMol Manual and a Univ. Mass. FAQ page.
Versions of RasMol are available for Mac and PC computers just by downloading. Major sites are the RasMol Home Page at U Mass in Amherst and the 3D Viewer Help Page at Sacch3D, Stanford Univ. Links to sites for downloading other 3D Viewers are also provided at the Sacch3D site. These sites also have documentation available, as well as examples of use of RasMol, user groups, bibliography, etc.
{c. Use RasMol on a PDB structure or two.}
{Obtain at least two *.pdb structure files for use with RasMol}
There are RasMol example files available on the Univ Mass RasMol tutorial Web page.
You can also obtain *.pdb structure files using keywords from
PDB using a
keyword search approach.
The name of the PDB file can also be learned from protein entry
annotation at SwissProt; links are also provided at SwissProt
to the PDB data file.
{Open a *.pdb file in RasMol and see what RasMol can do}
The simplest way to open a *.pdb file in RasMol is to open a Console or Terminal window on your Unix computer and do the following at the % prompt: rasmol <filename>.pdb
Alternatively, you can do at the % prompt: rasmol
Then choose open from the File pulldown menu
in Rasmol, and then enter the filename <file>.pdb
in response to the Rasmol prompt in the Terminal or Console window.
You can learn about RasMol in two ways:
Excellent documentation written by Roger Sayle, the creator of RasMol, is available under RasMol V2.5: Molecular Visualisation Program at Cambridge, England.
{Try RasMol on the file 3cro.pdb that came with the RasMol download. Try to reproduce the image shown at the top of this Exercise.}
This file can be obtained from PDB by searching for the keywords
"lambda cro".
Alternatively, it can be copied into your account from an ACS
directory on insci14 via execution of the following command at
the Unix % prompt:
cp /software/nonrdist/rasmol/RasMol2/data/3cro.pdb .
Note the period at the end of this Unix command ...
very important.
There are several Web Sites which present databases of Protein 3D images which can be visualized by RasMol and other 3D Viewers. The SWISS-3DIMAGE database is one of the major ones. Others include the NIH Center for Molecular Modeling; the NIH Molecules R Us facility; SCOP (Structural Classification of Proteins) at MRC, Cambridge; CATH (Class-Architecture-Topology-Homology) at Univ London, England; and The Protein Science Kinemage Index of PDB structures, imaged using the MAGE 3D viewer. And this is by NO MEANS a complete list ...
Here we use the SWISS-3DIMAGE and SCOP databases as examples
of such sites.
{a. Use of the SWISS-3DIMAGE database of 3D images.}
Go to the Swiss SWISS-3DIMAGE Home Page at ExPASy and spend some time reading and examining the information.
Access the SWISS-3DIMAGE database and bring up the entry of your favorite Protein whose structure is known and whose coordinates are in PDB. Note that a typical SwissProt database entry style entry comes up. Click on the '3D_IMAGE' link to visualize the Protein 3D structure.
Click on the 'RasMol' link. This will permit you to save the Protein PDB info for visualization in RasMol. Save this as a file in your account. Turn on RasMol and bring up this file.
Try to manipulate the figure so as to reproduce the figure brought up in NetScape from the 'GIF' or 'JPEG' links below.
Note the additional Images links present in the SWISS-3DIMAGE
entry. See what some of these look like by clicking on the 'GIF''
or 'JPEG' links. Which come up faster, the 'GIF'' or 'JPEG' links?
{b. Use of the SCOP database of 3D images.}
SCOP
(Structural Classification Of Proteins), a product of the Chothia
group at Cambridge, England, provides visual and text information
about structural elements with a given Protein and within classes
and families of Proteins. The SCOP database also provides reverse
information: all Proteins that have specific Structural elements.
{Learn about the SCOP database and facility}
Go to the SCOP database and click on the 'search' link at the bottom or on the Help icon at the top (the one with a ?); either link goes to the SCOP Help page. Read some of this page to learn about SCOP and what can be done at this facility.
{Maneuver up and down through the SCOP heirarchy.}
Use the Access method of "Keyword search of SCOP entries" to find your favorite Protein whose structure is known
Note the Icons at the top of the SCOP page. Click on some of these to see what they do. These permit you to move up and down through the SCOP heirarchy of objects: Protein <--> Family <--> Superfamily. Get a feel for this heirarchy. Click on the Chime Icons of Objects of interest to visualize their structures.
Note the Lineage Objects and the PDB Entry Domains. Examine some of these to see what they are.
Note the link to the 'RasMol graphics interface'; this Icon
looks like: ![]()
This permits downloading of a *.pdb file for visualization with
RasMol
Answer questions given above as well as the following:
Latest modification: Spring, 2001
If you have problems or questions, send email to Michael
Gribskov or Doug Smith
or Hiren Patel