Exercise 7 - Spring, 2001 - BIMM 141

KEY

 

Points: 260 pts


30 pts A. UPGMA alignment using GCG PILEUP
40 pts B. Neighbor joining alignment using CLUSTAL W
40 pts C. Parsimony trees using GCG version of PAUP

20 pts D. Parsimony and Maximum Likelihood using Phylip
130 pts E. Questions

 

 

{A. UPGMA based progressive multiple alignment using GCG PILEUP.}

{1. Find a suitable family of sequences.}
[0 pts]
Indicate what and how sequences were selected.

{2. Align your sequences using PILEUP.}
[10 pts]
Show output file

{3. Repeat using different gap penalties.}
[10 pts]
Compare/contrast default PILEUP alignment with alignments having different gap penalties.
Do you get the same or similar alignments, or very different alignments?
How good are the various alignments?

{4. Generate UPGMA and neighbor joining trees from your alignment}
[10 pts]
Show DISTANCES output and GROWTREE output.

 

{B. Neighbor joining based progressive alignment with CLUSTALW.}

{1. Find out about the CLUSTAL W program.}
[0 pts]
Indicate this was done.

{2. Align your sequences using CLUSTAL W, using both default parameters and the "Slow/Fast pairwise alignments " option. Report these trees in your notebook and compare them with the ones generated in A4 above. }
[15 pts]
Show outputs
[5 pts for comparisons]

{3. Examine the alignment and manually correct poorly aligned regions, remove very large gaps, etc. What changes did you make to your alignment and why?}
[5 pts]
Indicate what you did

{5. Write out your final tree in Phylip (.ph) and GCG (.msf)(alignment) formats for use in the next two sections of this exercise.}
[20 pts]
Show output .ph and .msf formatted outputs

 

{C. Parsimony trees using the GCG version of PAUP.}

{1. Read the CLUSTAL alignment into the GCG PAUPSEARCH program. Construct a tree using the "Exhaustive tree search" and "Parsimony" options. Make a plot of this tree using the PAUPDSPLAY program. }
[10 pts]
Show output trees

{2. Construct a tree using the "Heuristic tree search" and "Parsimony" options. How does this tree compare to the tree in C1?}
[10 pts]
Show output file
[5 pts for tree; 5 pts for answer to question]

{3. Perform a bootstrap analysis of the parsimony tree using the "Bootstrap analysis using branch-and-bound search" and "Parsimony" options. Based on the partition analysis, what are the most likely alternatives to your tree? }
[10 pts]
Show output file
[5 pts for tree; 5 pts for answer to question]

{4. Perform a bootstrap analysis using the "Bootstrap analysis using neighbor-joining distance" option and compare this to the CLUSTALW analysis.}
[10 pts]
Show output file
[5 pts for tree; 5 pts for answer to question]

 

{D. Parsimony and maximum likelihood trees using Phylip.}

{1. Familiarize yourself with the Phylip PROTPARS program by reading the document in /software/nonrdist/phylip-3.6/doc/protpars.html. Run PROTPARS (/home/solaris/nsci/bi141s/protpars) and compare the tree to those of PAUPSEARCH. }
[10 pts]
Show output file
[5 pts for tree; 5 pts for comparison]

{2. Familiarize yourself with the Phylip PROML program by reading the document in /software/nonrdist/phylip-3.6/doc/proml.html. Run PROML (/home/solaris/nsci/bi141s/proml) and compare to the other trees you have constructed.}
[10 pts]
Show output file
[5 pts for tree; 5 pts for comparison]

 

{E. Questions.}

1. Explain what is meant by progressive alignment. [ 10 pts]
A progressive alignment starts with the dynamic programming alignment of two initial sequences. This alignment is then fixed and used as the core for aligning additional sequences one at a time. Progressive alignments are the only practical means of multiple alignment techniques because it is too computationally expensive to test all possible alignments.
+8 for explanation of progressive alignment
+2 for explaning other methods are computationally expensive
2. Why are progressive alignments the only practical multiple alignment techniques? [5 pts]
For computational time reasons ... a general approach would require too much computational capability.
 
3. Explain the basic steps used by PILEUP in making a progressive alignment. [10 pts]
PILEUP generates a progressive alignment by first doing pairwise global dynamic alignments to create a distance matrix. Next, PILEUP determines which two sequences are the closest and does a dynamic alignment. This alignment then serves as the core for aligning subsequent sequences. A UPGMA (unweighted pair group method of averages) tree is used to determine the order for adding the rest of the sequences.

From the PILEUP help file:
PileUp begins by doing a pairwise alignment that scores the similarity between every possible pair of sequences. These similarity scores are used to create a
clustering order that can be represented as a dendrogram. The clustering
strategy represented by the dendrogram is called UPGMA that stands for
unweighted pair-group method using arithmetic averages (Sneath, P.H.A. and
Sokal, R.R. (1973) in Numerical Taxonomy (pp; 230-234), W.H. Freeman and
Company, San Francisco, California, USA).

PileUp uses this clustering order and first aligns the two most-related
sequences to each other in order to produce the first cluster. It then aligns
the next most related sequence to this cluster or the next two most-related
sequences to each other in order to produce another cluster. A series of such
pairwise alignments that, including increasingly dissimilar sequences and
clusters of sequences at each iteration, produces the final alignment.
+7 for the steps (pairwise matrix generation; core alignment; addition of seqs)
+3 for UPGMA

4. What type of tree is used as a guide tree by PILEUP? [5 pts]
UPGMA
 
5. What are the main enhancements in CLUSTAL with respect to PILEUP? [20 pts]
CLUSTAL like PILEUP uses dynamic programming to do pairwise alignments and makes a distance matrix. The difference is that CLUSTAL uses neighbor-joining to build the tree from the distance matrix.
Advantages of CLUSTAL include: (1) use either fast (ktup-based) or slow methods for calculating initial distances; (2) uses multiple scoring tables to match sequences; (3) can set-up position-dependent gap penalties; (4) can balance the weight closely related sequences; and (5) automatically creates a tree based on the final alignment.
+4 pts for each fact with a max of 5 facts.
6. What are the two most serious drawbacks common to all progressive alignments? [10 pts]
The most serious drawbacks to progressive alignments are (1) mistakes early on tend to be kept without possibility of correction and (2) that reliability/estimation of the alignment is heuristic (not mathematically rigorous).
+8 for "mistakes"
+2 for "heuristics"
7. Describe two ways one can evaluate the correctness of a phylogenetic tree based on sequence data? [10 pts]
The two major sequence data-based tree-building methods are parsimony and maximum likelihood.
Parsimony trees are usually evaluated by bootstrapping. In this method, the data set is randomly sampled with replacement to create a new data set by which a tree is built. This is repeated 100-500 for "quick and dirty" results and ~1000+ for reliable/publishable results. A good tree/branch is one that is seen in ~95% of the bootstrap trees.
Maximum likelihood uses an "implicit" model by which to evaluate its own trees. Every tree has a likelihood score and the "best tree" is selected as the tree with the greatest log probability.
(Alternative: you might try using multiple methods and seeing if you get a consensus, but it is more likely you'll get a different tree for each method and each
+7 for first (parsimony or ML); +3 for the second
+5 for other potentially valid method (method comparison)
8. What is an outgroup? Suggest a species that would be an appropriate outgroup for a tree linking humans, cows, raccoons, dogs, and elephants. [10 pts]
An outgroup is a taxon (or taxa) that is definitively unrelated to a group of related taxa. In phylogenetic analysis, the outgroup determines the root of a tree. If one were comparing humans, cows, raccoons, dogs, and elephants which are all mammals, an appropriate outgroup would be a non-mammalian vertebrate species such as a reptile.
It is important to select an outgroup that can unequivocably placed outside of the clustered taxa, but not so distant as to be uninformative.
+6 pts to describe outgroup
+2 pts to pick outgroup
+2 pts to use of outgroup to root a treet
9. What is an outgroup used for in phylogenetic analysis? [5 pts]
An outgroup, a species more distant from each of the other species than any of the other species is to any of the other other species, is used to provide a basis for conversion of an unrooted tree into a rooted tree.
 
10. What is meant by "once a gap, always a gap"? [5 pts]
When gaps are generated in the dynamic alignment between an initial pair of sequences, and the penalty is only paid once and the gap kept in all subsequent alignments. This can lead to problems if the gap is assigned incorrectly early on because it can not be "fixed" later.
11. What are the main differences between the Fitch-Margoliash method, the neighbor joining method, and the UPGMA method? [5 pts]
UPGMA/Fitch-Margoliash and neighbor-joining are both tree-building algorithms using distance matrices. UPGMA/Fitch-Margoliash builds a tree by selecting the two closest (least distant) sequences to create a pair then adding the next nearest sequence one at a time. Neighbor-joining does a pairwise comparison of each sequence and selects the pair that minimizes the distance among the remaining taxa.
12. Bootstrapping is a resampling method. What is it that is resampled and how is this done? [10 pts]
In bootstrapping, different columns in a multiple sequence alignment are resampled. This can be done in a variety of ways, but they all involve substituting some columns with other columns. Methods vary as to whether some columns are discarded or not.
13. Why does one need to "correct" distance used in making trees? [5 pts]
The major reason is to correct for different clocks or evolutionary rates used in different branches or at different times. A clock also usually assumes neutral mutations; this clearly is not correct - selective pressure related to function exists for many residues.
14. Describe two problems with parsimony approaches to constructing trees. [10 pts]
Many problems are possible here, including:
1) can not get branch lengths from parsimony
2) some positions support one tree, some support another.
3) not all sites are "informative", ie they don't permit discrimination between two different tree topologies.
4) One must have at least two characteris, each occurring at least twice in the tree, to deduce the tree. This is sometimes difficult, particularly with the low alphabet found in DNA.
5) If there were multiple mutations, parsimony has a particularly hard time.
 
15. How does the "Occams razor" principle relate to parsimony trees. [5 pts]
The tree that would require the fewest inferred mutations is considered to be the "correct" one: minimum evolution principal.
 
16. What kind of tree would be best for classifying enzymes according to function. Why? [5 pts]
Here one wants the maximum and best information for the regions of the multiple sequence alignment that have to do with enzyme function. These are likely to be the slowest changing regions, ie most highly conserved regions, in the multiple sequence alignment.