Exercise 9 - Spring, 2001 - BIMM 141

KEY



Points: 240 pts
15 pts A. Secondary Structure Prediction

30 pts B. Transmembrane Sequences
20 pts C. Homology Modeling
20 pts D. Protein 3D Structure Visualization
155 pts E. Questions
 
 

{A. Secondary Structure Prediction }

10 points {1. Use GCG PEPTIDESTRUCTURE and PLOTSTRUCTURE to predict secondary structure by the Chou-Fasman method. }

10 points {2. Use GCG PEPPLOT to predict secondary structure by the GOR method. }

5 points {3. Compare and evaluate the predictions made by the above methods to each other and to the actual structure. }
 
 

{B. Transmembrane Sequences}

10 points {1. Use one or more of the web services to predict transmembrane sequences. }

10 points {2. Compare prediction to Kyte-Doolittle style hydrophobicity plot using the GCG program PEPPLOT. }

10 points {3. Use PSORT and SignalP to predict signal peptides. }
 
 

{C. Homology Modeling }

20 points {1. Use Swiss-model to get a homology based structure prediction }
 
 

{D. Protein 3D Structure Visualization}

10 points {1. Use RasMol on your Sun Solaris Computer.}

{a. Activate RasMol on your Sun computer}

{b. RasMol Documentation and Tutorials}

{c. Use RasMol on a PDB structure or two.}

{Obtain at least two *.pdb structure files for use with RasMol}

{Open a *.pdb file in RasMol and see what RasMol can do}

{Try RasMol on the file 3cro.pdb that came with the RasMol download. Try to reproduce the image shown at the top of this Exercise.}

10 points {2. Databases of annotated 3D images.}

{a. Use of the SWISS-3DIMAGE database of 3D images.}

{b. Use of the SCOP database of 3D images.}

{Learn about the SCOP database and facility}

{Maneuver up and down through the SCOP heirarchy.}
 
 

{E. Questions:}

Answer questions given in the Exercise text as well as the following:

1. Give at least two reasons why the molecular weight calculated for a protein based on its sequence may be wrong. How might your reasons be related to protein function? [10 pts]
Molecular weight calculated based on the sequence cannot take into account post-translational modifications such as glycosylation, phosphorylation or other chemical modification of residues. A sequence based calculation also cannot take processing such as removal of a signal sequence or cleavage by a protease into account. Protein functions related to these post-translational modifications include signal transduction, passage through membranes, functions as membrane proteins, and any protein function involving glycosylation.

2. How might post-translational modification events affect protein secondary structure? [5 pts]
Post-translational modification events can affect nearly all properties of 3D structure of a protein: size, charge, hydrophobicity.   A phosphorylation event, or glycosylation event, or adenylation event, would change the local environment substantially, making likely changes in the 3D structure and/or secondary structure.  Even a methylation or acetylation event could have a substantial change in 3D structure ...

3. How might such modification events be important to "proteomics"? [5 pts]
"Proteomics" is the elucidation of the totality of protein-related events in the cell: concentration of all proteins in a given environment, including post-translationally modified protein variants.  Such variants can be considered to be "New" proteins, often with very different function than the original gene translate.  A good example of such are protein phosphorylation cascades.  Hence, post-translational modification events are of critical importance to proteomics.

4. What is the main feature in "normal" globular proteins that could be confused with a transmembrane region when using the Kyte-Doolittle method? What distinguishes these regions from "true" transmembrane regions? [5 pts]
The Kyte-Doolittle method is looking for long stretches of hydrophobic residues. The most similar feature of globular proteins is a hydrophobic region where the protein chain is buried in the interior of the protein. These regions are typically beta strands and differ from transmembrane sequences primarily in being much shorter (6-8 residues vs 17-22).

5. Are all transmembrane sequences extremely hydrophobic? Why or why not? What is an amphipathic protein? [10 pts]
Only isolated transmembrane helices should be hydrophobic on all sides. Helices that pack against other helices to form a porrin may have a hydrophilic side that points towards the water filled pore, or polar and charged residues that interact with the other helices. Such alpha helices are called Amphipathic helices and protein that contain amphipathic secondary structure features are called Amphipathic proteins.

6. What are the main alpha helix forming and breaking residues in the Chou-Fasman secondary structure prediction method? [5 pts]
The main heix forming residues (H) are ala, glu, leu and met.
The main helix breaking residues (B) are proline and glycine.

7. What are the main beta sheet forming and breaking residues in the Chou-Fasman secondary structure prediction method? [5 pts]
The main beta sheet forming residues (H) are ile, val, and tyr.
The main beta sheet breaking residues (B) are pro, asp, and glu.

8. Why is a proline a breaking residue for both alpha and beta structures? [5 pts]
Proline's unique structure in which the side chain is cyclically attached to the backbone gives it unique structural properties. It cannot assume the backbone dihedral angles typical of alpha and beta structures, nor can it form appropriate hydrogen bonds.

9. What are the preferred residues at the central two positions of a turn? Are turns simply hydrophilic regions ? Explain, basing your answer on either the Chou-Fasman or GOR prediction method. [10 pts]
In the Chou and Fasman method, the central positions of the turn (i+1 and i+2 position) show strong preferences for pro (30%), ser (14%), lys, asp, arg, and thr (the latter four about 10% each) at the first position, and asn (19%), gly (19%), asp (18%), ser (13%), cys (12%) and tyr (11%) at the second position.
For the GOR method, it is more difficult to define the central positions, but using i-1 to i+1, there are strong preferences for pro, gly, asn, trp, and cys. These basicly agree with the CF parameters above with the exception of trp. CF parameters indicate that trp is exceptionally common at the i+3 position, explaining the difference.

10. Describe at least two potential problems with the Profile approach to describing sequence motifs. What are the advantages of a position-specific weight matrix approach as compared to a regular expression appraoch? [15 pts]
The major problem with the "average profile" approach described in the readings is that highly conserved positions are modelled as corresponding to the residue distribution implied by the PAM 250 matrix used to calculate the profile. The other main problem with all kinds of motif description is that they depend on a correct alignment. The main advantage of a position-specific weight matrix approach relative to a regular expression approach is that it provides a quantitative measure of the likelihood of each residue being present at each position in the motif. Due to this additional position-based quantitative information, a weight-matrix based approach can do all that a regular expression approach can do, plus accomodate partial matches, with quantitative measures for evaluation of significance of a given match.

11. What does the consensus sequence shown at the left-hand side of the profile represent? Would you expect to get equivalent (to PROFILESEARCH) results if you simply used the consensus sequence in a BLAST search? [10 pts]
The consensus sequence represents the highest scoring residue in each row. This corresponds to the residue that best represents that column of the aligned sequences. A BLAST search would likely return more False Positives (lower Specificity), and probably would miss more distant True Positivies (lower Sensitivity), than would PROFILESEARCH using a well developed Profile.

12. What is the most difficult part of the unsupervised learning approach used by MEME? Finding the description of the motif, finding the locations of the motif, or determining the width of the motif. Why? [20 pts]
Determining the width of the motif is the most difficult part of unsupervised learning approaches such as MEME or the Gibbs sampler. Each additional column in the weight matrix adds parameters to the model, and models with more parameters will always fit the data better. Because of this, it is difficult to tell if the addition of columns to the motif description improves the model enough to justify adding them (since even the addition of columns at random will improve the models ability to describe the sequences).

13. What are the major Display options found in RasMol? Why might one choose each of these options? [10 pts]
The major Display options are Wireframe, Backbone, Stick, Spacefill, Ball & Stick, Ribbons, Strands, and Cartoons.
Wireframe, Stick, and Ball & Stick are similar; they each show all covalent bonds, with Wireframe the thinnest, and Ball & Stick also showing a ball at the position of each atom.  Atoms or covalent Bonds are color-coded.
Ribbons and Strands are similar, depicting the backbone as a ribbon of varying thickness (thick for alpha helix or beta sheet, otherwise thin); strands simply show a series of parallel lines to depict a strand rather than a solid strand.  Cartoons shows secondary structure even more profoundly, with arrows and strand for beta-strand, and helical strand for alpha helix; "random coil" is a coil of small diameter.  Spacefill shows a spacefilling model of each atom, and is used to look for "free space" within one of these molecules.  Backbone just shows the backbone of the molecule (alpha carbons and peptide bond main-chain atoms, in Stick structure).
[5 pts for Wireframe, Stick, and Ball & Stick]
[5 pts for all the others]

14. What do each of the Color options distinguish in their color patterns in RasMol? [10 pts]
Monchrome is black and white; CPK should be CPK colors but also appears as black and white on the Sun,
Shapely provides a different color for each residue, to clearly distinguish between residues;
Group provides different main colors for each of the 3 main types of sec struc, with varying shades for different parts of the structural element; Chain is black and white on the Sun; Temperature shows different colors for different degrees of reactivity but is just different shades of blue on the Sun; Structure shows a different single-tone color for each type of secondary structural element; and User permits user-defined color schemes.

15. What do each of the Options pulldown menu choices do in RasMol? [10 pts]
Slab Mode permits slab cuts through the molecule; Hydrogens show all hydrogen atoms; Hetero Atoms selects all non-H atoms; Specular adds lighting effects to the atoms (best seen in Spacefilling mode); Shadows adds shadows, providing a 3D illumination effect; Stereo provides 2 views which are stereo views of the molecule;
Labels are labels to each of the residues.

16. What Display and Color options were used to obtain the Cro Operator graphic at the top of this Exercise 9? [5 pts]
The Display is Backbone, Color is Group.

17. What are the basic steps used in Homology Modeling, for example, as done by Swiss-Model? [10 pts; 2 pts for any of the 5 steps below]
The basic steps are (see Lecture notes):
1. Define all Sequences related to the query protein that can assume a particular 3D structure, i.e. the Inverse Folding Problem is assumed to be solved.
2. Perform a Multiple Sequence Alignment of these sequences
3. If two or more sequences have experimentally determined structures, align these structures to determine the Structurally Conserved Regions (SCRs). Otherwise use the conserved regions of the multiple sequence alignment to define the most highly conserved (and hence important) residues.
4. Use the conserved residues (Common Fold) or SCRs as a Framework for Threading: match the sequence positions of the query protein with those of the Structure. In doing so, assign Equivalent Residues (those that line up between the query protein and the structure), and minimize C-alpha carbon distances. Use other protein properties (Rachandran angles, solvent accessibility, H-bonding, secondary structure similarity, residue similarity, ...) in optimally making these assignments.
5. Use a variety of methods to assign structural positions to Structurally Variable Regions (SVRs), notably loop regions and R groups. For example, linkers (known loop sequences) and Collar Extension can be used to model diffficult loops. Side-chain Rotamer libraries can be used to model preferred torsion angles found in side chains.
6. Use computational methods to optimize the structure: Simulated Annealing to maximize equivalent positions and packing; Energy Minimization to optimize atomic positions and include solvent effects; Molecular Dynamics to look at stability of structure, notably loops and surface exposed side-chains.

18. How do the Swiss-3D Images compare with those in RasMol? [5 pts]
The Swiss-3D Images are similar to those of RasMol, but tend to have different combinations of the types of Displays and the colors are different.

19. What are the "Lineage" objects in SCOP? [5 pts]
The "lineage" objects are the groups, at succissively lower (more specific) levels of the hierarchy, to which the current item belongs.

20. What are the two primary visualization tools used in SCOP? [5 pts]
RasMol and Chime