Exercise 9 - Spring, 2001 - BIMM 141
KEY
Points: 240 pts
15 pts A. Secondary Structure Prediction
30 pts B. Transmembrane Sequences
20 pts C. Homology Modeling
20 pts D. Protein 3D Structure
Visualization
155 pts E. Questions
{A. Secondary Structure Prediction }
10 points {1. Use GCG PEPTIDESTRUCTURE and PLOTSTRUCTURE to predict secondary structure by the Chou-Fasman method. }
10 points {2. Use GCG PEPPLOT to predict secondary structure by the GOR method. }
5 points {3. Compare
and evaluate the predictions made by the above methods to each
other and to the actual structure. }
{B. Transmembrane Sequences}
10 points {1. Use one or more of the web services to predict transmembrane sequences. }
10 points {2. Compare prediction to Kyte-Doolittle style hydrophobicity plot using the GCG program PEPPLOT. }
10 points {3.
Use PSORT and SignalP to predict signal peptides. }
{C. Homology Modeling }
20 points {1.
Use Swiss-model to get a homology based structure prediction }
{D. Protein 3D Structure Visualization}
10 points {1. Use RasMol on your Sun Solaris Computer.}
{a. Activate RasMol on your Sun computer}
{b. RasMol Documentation and Tutorials}
{c. Use RasMol on a PDB structure or two.}
{Obtain at least two *.pdb structure files for use with RasMol}
{Open a *.pdb file in RasMol and see what RasMol can do}
{Try RasMol on the file 3cro.pdb that came with the RasMol download. Try to reproduce the image shown at the top of this Exercise.}
10 points {2. Databases of annotated 3D images.}
{a. Use of the SWISS-3DIMAGE database of 3D images.}
{b. Use of the SCOP database of 3D images.}
{Learn about the SCOP database and facility}
{Maneuver up and down through the SCOP heirarchy.}
{E. Questions:}
Answer questions given in the Exercise text as well as the following:
1. Give at least two reasons why the molecular weight calculated
for a protein based on its sequence may be wrong. How might your
reasons be related to protein function? [10
pts]
Molecular weight calculated based on the
sequence cannot take into account post-translational modifications
such as glycosylation, phosphorylation or other chemical modification
of residues. A sequence based calculation also cannot take processing
such as removal of a signal sequence or cleavage by a protease
into account. Protein functions related to these post-translational
modifications include signal transduction, passage through membranes,
functions as membrane proteins, and any protein function involving
glycosylation.
2. How might post-translational modification events affect
protein secondary structure? [5 pts]
Post-translational modification events can
affect nearly all properties of 3D structure of a protein: size,
charge, hydrophobicity. A phosphorylation event, or
glycosylation event, or adenylation event, would change the local
environment substantially, making likely changes in the 3D structure
and/or secondary structure. Even a methylation or acetylation
event could have a substantial change in 3D structure ...
3. How might such modification events be important to "proteomics"?
[5 pts]
"Proteomics" is the elucidation
of the totality of protein-related events in the cell: concentration
of all proteins in a given environment, including post-translationally
modified protein variants. Such variants can be considered
to be "New" proteins, often with very different function
than the original gene translate. A good example of such
are protein phosphorylation cascades. Hence, post-translational
modification events are of critical importance to proteomics.
4. What is the main feature in "normal" globular
proteins that could be confused with a transmembrane region when
using the Kyte-Doolittle method? What distinguishes these regions
from "true" transmembrane regions? [5
pts]
The Kyte-Doolittle method is looking for
long stretches of hydrophobic residues. The most similar feature
of globular proteins is a hydrophobic region where the protein
chain is buried in the interior of the protein. These regions
are typically beta strands and differ from transmembrane sequences
primarily in being much shorter (6-8 residues vs 17-22).
5. Are all transmembrane sequences extremely hydrophobic? Why
or why not? What is an amphipathic protein? [10 pts]
Only isolated transmembrane helices should
be hydrophobic on all sides. Helices that pack against other helices
to form a porrin may have a hydrophilic side that points towards
the water filled pore, or polar and charged residues that interact
with the other helices. Such alpha helices are called Amphipathic
helices and protein that contain amphipathic secondary structure
features are called Amphipathic proteins.
6. What are the main alpha helix forming and breaking residues
in the Chou-Fasman secondary structure prediction method? [5 pts]
The main heix forming residues (H) are ala,
glu, leu and met.
The main helix breaking residues (B) are
proline and glycine.
7. What are the main beta sheet forming and breaking residues
in the Chou-Fasman secondary structure prediction method? [5 pts]
The main beta sheet forming residues (H)
are ile, val, and tyr.
The main beta sheet breaking residues (B)
are pro, asp, and glu.
8. Why is a proline a breaking residue for both alpha and beta
structures? [5
pts]
Proline's unique structure in which the
side chain is cyclically attached to the backbone gives it unique
structural properties. It cannot assume the backbone dihedral
angles typical of alpha and beta structures, nor can it form appropriate
hydrogen bonds.
9. What are the preferred residues at the central two positions
of a turn? Are turns simply hydrophilic regions ? Explain, basing
your answer on either the Chou-Fasman or GOR prediction method.
[10 pts]
In the Chou and Fasman method, the central
positions of the turn (i+1 and i+2 position) show strong preferences
for pro (30%), ser (14%), lys, asp, arg, and thr (the latter four
about 10% each) at the first position, and asn (19%), gly (19%),
asp (18%), ser (13%), cys (12%) and tyr (11%) at the second position.
For the GOR method, it is more difficult
to define the central positions, but using i-1 to i+1, there are
strong preferences for pro, gly, asn, trp, and cys. These basicly
agree with the CF parameters above with the exception of trp.
CF parameters indicate that trp is exceptionally common at the
i+3 position, explaining the difference.
10. Describe at least two potential problems with the Profile
approach to describing sequence motifs. What are the advantages
of a position-specific weight matrix approach as compared to a
regular expression appraoch? [15 pts]
The major problem with the "average
profile" approach described in the readings is that highly
conserved positions are modelled as corresponding to the residue
distribution implied by the PAM 250 matrix used to calculate the
profile. The other main problem with all kinds of motif description
is that they depend on a correct alignment. The main advantage
of a position-specific weight matrix approach relative to a regular
expression approach is that it provides a quantitative measure
of the likelihood of each residue being present at each position
in the motif. Due to this additional position-based quantitative
information, a weight-matrix based approach can do all that a
regular expression approach can do, plus accomodate partial matches,
with quantitative measures for evaluation of significance of a
given match.
11. What does the consensus sequence shown at the left-hand
side of the profile represent? Would you expect to get equivalent
(to PROFILESEARCH) results if you simply used the consensus sequence
in a BLAST search? [10 pts]
The consensus sequence represents the highest
scoring residue in each row. This corresponds to the residue that
best represents that column of the aligned sequences. A BLAST
search would likely return more False Positives (lower Specificity),
and probably would miss more distant True Positivies (lower Sensitivity),
than would PROFILESEARCH using a well developed Profile.
12. What is the most difficult part of the unsupervised learning
approach used by MEME? Finding the description of the motif, finding
the locations of the motif, or determining the width of the motif.
Why? [20 pts]
Determining the width of the motif is the
most difficult part of unsupervised learning approaches such as
MEME or the Gibbs sampler. Each additional column in the weight
matrix adds parameters to the model, and models with more parameters
will always fit the data better. Because of this, it is difficult
to tell if the addition of columns to the motif description improves
the model enough to justify adding them (since even the addition
of columns at random will improve the models ability to describe
the sequences).
13. What are the major Display options found in RasMol? Why
might one choose each of these options? [10
pts]
The major Display options are Wireframe,
Backbone, Stick, Spacefill, Ball & Stick, Ribbons, Strands,
and Cartoons.
Wireframe, Stick, and Ball & Stick are similar; they each
show all covalent bonds, with Wireframe the thinnest, and Ball
& Stick also showing a ball at the position of each atom.
Atoms or covalent Bonds are color-coded.
Ribbons and Strands are similar, depicting the backbone as a ribbon
of varying thickness (thick for alpha helix or beta sheet, otherwise
thin); strands simply show a series of parallel lines to depict
a strand rather than a solid strand. Cartoons shows secondary
structure even more profoundly, with arrows and strand for beta-strand,
and helical strand for alpha helix; "random coil" is
a coil of small diameter. Spacefill shows a spacefilling
model of each atom, and is used to look for "free space"
within one of these molecules. Backbone just shows the backbone
of the molecule (alpha carbons and peptide bond main-chain atoms,
in Stick structure).
[5 pts for Wireframe, Stick, and
Ball & Stick]
[5 pts for all the others]
14. What do each of the Color options distinguish in their
color patterns in RasMol? [10 pts]
Monchrome is black and white; CPK should
be CPK colors but also appears as black and white on the Sun,
Shapely provides a different color for each residue, to clearly
distinguish between residues;
Group provides different main colors for each of the 3 main types
of sec struc, with varying shades for different parts of the structural
element; Chain is black and white on the Sun; Temperature shows
different colors for different degrees of reactivity but is just
different shades of blue on the Sun; Structure shows a different
single-tone color for each type of secondary structural element;
and User permits user-defined color schemes.
15. What do each of the Options pulldown menu choices do in
RasMol? [10 pts]
Slab Mode permits slab cuts through the
molecule; Hydrogens show all hydrogen atoms; Hetero Atoms selects
all non-H atoms; Specular adds lighting effects to the atoms (best
seen in Spacefilling mode); Shadows adds shadows, providing a
3D illumination effect; Stereo provides 2 views which are stereo
views of the molecule;
Labels are labels to each of the residues.
16. What Display and Color options were used to obtain the
Cro Operator graphic at the top of this Exercise 9? [5
pts]
The Display is Backbone, Color is Group.
17. What are the basic steps used in Homology Modeling, for
example, as done by Swiss-Model? [10
pts; 2 pts for any of the 5 steps below]
The basic steps are (see Lecture notes):
1. Define all Sequences related to the query
protein that can assume a particular 3D structure, i.e. the Inverse
Folding Problem is assumed to be solved.
2. Perform a Multiple Sequence Alignment
of these sequences
3. If two or more sequences have experimentally
determined structures, align these structures to determine the
Structurally Conserved Regions (SCRs). Otherwise use the
conserved regions of the multiple sequence alignment to define
the most highly conserved (and hence important) residues.
4. Use the conserved residues (Common Fold)
or SCRs as a Framework for Threading: match the sequence
positions of the query protein with those of the Structure. In
doing so, assign Equivalent Residues (those that line up
between the query protein and the structure), and minimize C-alpha
carbon distances. Use other protein properties (Rachandran angles,
solvent accessibility, H-bonding, secondary structure similarity,
residue similarity, ...) in optimally making these assignments.
5. Use a variety of methods to assign structural
positions to Structurally Variable Regions (SVRs), notably
loop regions and R groups. For example, linkers (known loop sequences)
and Collar Extension can be used to model diffficult loops. Side-chain
Rotamer libraries can be used to model preferred torsion angles
found in side chains.
6. Use computational methods to optimize
the structure: Simulated Annealing to maximize equivalent
positions and packing; Energy Minimization to optimize
atomic positions and include solvent effects; Molecular Dynamics
to look at stability of structure, notably loops and surface exposed
side-chains.
18. How do the Swiss-3D Images compare with those in RasMol?
[5 pts]
The Swiss-3D Images are similar to those
of RasMol, but tend to have different combinations of the types
of Displays and the colors are different.
19. What are the "Lineage" objects in SCOP? [5 pts]
The "lineage" objects are the
groups, at succissively lower (more specific) levels of the hierarchy,
to which the current item belongs.
20. What are the two primary visualization tools used in SCOP?
[5 pts]
RasMol and Chime