Exercise 7 - Spring, 2001 -
BIMM 141
KEY
Points: 260 pts
30 pts A. UPGMA alignment
using GCG PILEUP
40 pts B. Neighbor joining alignment using CLUSTAL W
40 pts C. Parsimony trees using GCG version of PAUP
20 pts D. Parsimony and Maximum
Likelihood using Phylip
130 pts E. Questions
{A. UPGMA based progressive multiple alignment using
GCG PILEUP.}
{1. Find a suitable family of sequences.}
[0 pts]
Indicate what and how sequences
were selected.
{2. Align your sequences using PILEUP.}
[10 pts]
Show output file
{3. Repeat using different gap penalties.}
[10 pts]
Compare/contrast default PILEUP
alignment with alignments having different gap penalties.
Do you get the same or similar alignments, or very different alignments?
How good are the various alignments?
{4. Generate UPGMA and neighbor joining trees from your
alignment}
[10 pts]
Show DISTANCES output and GROWTREE
output.
{B. Neighbor joining based progressive alignment with
CLUSTALW.}
{1. Find out about the CLUSTAL W program.}
[0 pts]
Indicate this was done.
{2. Align your sequences using CLUSTAL W, using both
default parameters and the "Slow/Fast pairwise alignments
" option. Report these trees in your notebook and compare
them with the ones generated in A4 above. }
[15 pts]
Show outputs
[5 pts for comparisons]
{3. Examine the alignment and manually correct poorly
aligned regions, remove very large gaps, etc. What changes did
you make to your alignment and why?}
[5 pts]
Indicate what you did
{5. Write out your final tree in Phylip (.ph) and GCG
(.msf)(alignment) formats for use in the next two sections of
this exercise.}
[20 pts]
Show output .ph and .msf formatted
outputs
{C. Parsimony trees using the GCG version of PAUP.}
{1. Read the CLUSTAL alignment into the GCG PAUPSEARCH
program. Construct a tree using the "Exhaustive tree search"
and "Parsimony" options. Make a plot of this tree using
the PAUPDSPLAY program. }
[10 pts]
Show output trees
{2. Construct a tree using the "Heuristic tree search"
and "Parsimony" options. How does this tree compare
to the tree in C1?}
[10 pts]
Show output file
[5 pts for tree; 5 pts for answer
to question]
{3. Perform a bootstrap analysis of the parsimony tree
using the "Bootstrap analysis using branch-and-bound search"
and "Parsimony" options. Based on the partition analysis,
what are the most likely alternatives to your tree? }
[10 pts]
Show output file
[5 pts for tree; 5 pts for answer
to question]
{4. Perform a bootstrap analysis using the "Bootstrap
analysis using neighbor-joining distance" option and compare
this to the CLUSTALW analysis.}
[10 pts]
Show output file
[5 pts for tree; 5 pts for answer
to question]
{D. Parsimony and maximum likelihood trees using Phylip.}
{1. Familiarize yourself with the Phylip PROTPARS program
by reading the document in /software/nonrdist/phylip-3.6/doc/protpars.html.
Run PROTPARS (/home/solaris/nsci/bi141s/protpars) and compare
the tree to those of PAUPSEARCH. }
[10 pts]
Show output file
[5 pts for tree; 5 pts for comparison]
{2. Familiarize yourself with the Phylip PROML program
by reading the document in /software/nonrdist/phylip-3.6/doc/proml.html.
Run PROML (/home/solaris/nsci/bi141s/proml) and compare to the
other trees you have constructed.}
[10 pts]
Show output file
[5 pts for tree; 5 pts for comparison]
{E. Questions.}
- 1. Explain what is meant by progressive alignment. [ 10 pts]
A progressive alignment starts
with the dynamic programming alignment of two initial sequences.
This alignment is then fixed and used as the core for aligning
additional sequences one at a time. Progressive alignments are
the only practical means of multiple alignment techniques because
it is too computationally expensive to test all possible alignments.
+8 for explanation of progressive
alignment
+2 for explaning other methods are computationally expensive
- 2. Why are progressive alignments the only practical multiple
alignment techniques? [5 pts]
For computational time reasons
... a general approach would require too much computational capability.
-
- 3. Explain the basic steps used by PILEUP in making a progressive
alignment. [10 pts]
PILEUP generates a progressive
alignment by first doing pairwise global dynamic alignments to
create a distance matrix. Next, PILEUP determines which two sequences
are the closest and does a dynamic alignment. This alignment
then serves as the core for aligning subsequent sequences. A
UPGMA (unweighted pair group method of averages) tree is used
to determine the order for adding the rest of the sequences.
From the PILEUP help file:
PileUp begins by doing a pairwise alignment that scores the similarity
between every possible pair of sequences. These similarity scores
are used to create a
clustering order that can be represented as a dendrogram. The
clustering
strategy represented by the dendrogram is called UPGMA that stands
for
unweighted pair-group method using arithmetic averages (Sneath,
P.H.A. and
Sokal, R.R. (1973) in Numerical Taxonomy (pp; 230-234), W.H. Freeman
and
Company, San Francisco, California, USA).
PileUp uses this clustering order and
first aligns the two most-related
sequences to each other in order to produce the first cluster.
It then aligns
the next most related sequence to this cluster or the next two
most-related
sequences to each other in order to produce another cluster. A
series of such
pairwise alignments that, including increasingly dissimilar sequences
and
clusters of sequences at each iteration, produces the final alignment.
+7 for the steps (pairwise matrix
generation; core alignment; addition of seqs)
+3 for UPGMA
- 4. What type of tree is used as a guide tree by PILEUP? [5 pts]
UPGMA
-
- 5. What are the main enhancements in CLUSTAL with respect
to PILEUP? [20 pts]
CLUSTAL like PILEUP uses dynamic
programming to do pairwise alignments and makes a distance matrix.
The difference is that CLUSTAL uses neighbor-joining to build
the tree from the distance matrix.
Advantages of CLUSTAL include: (1) use either fast (ktup-based)
or slow methods for calculating initial distances; (2) uses multiple
scoring tables to match sequences; (3) can set-up position-dependent
gap penalties; (4) can balance the weight closely related sequences;
and (5) automatically creates a tree based on the final alignment.
+4 pts for each fact with a max
of 5 facts.
- 6. What are the two most serious drawbacks common to all
progressive alignments? [10 pts]
The most serious drawbacks to
progressive alignments are (1) mistakes early on tend to be kept
without possibility of correction and (2) that reliability/estimation
of the alignment is heuristic (not mathematically rigorous).
+8 for "mistakes"
+2 for "heuristics"
- 7. Describe two ways one can evaluate the correctness of
a phylogenetic tree based on sequence data?
[10 pts]
The two major sequence data-based
tree-building methods are parsimony and maximum likelihood.
Parsimony trees are usually evaluated by bootstrapping. In this
method, the data set is randomly sampled with replacement to
create a new data set by which a tree is built. This is repeated
100-500 for "quick and dirty" results and ~1000+ for
reliable/publishable results. A good tree/branch is one that
is seen in ~95% of the bootstrap trees.
Maximum likelihood uses an "implicit" model by which
to evaluate its own trees. Every tree has a likelihood score
and the "best tree" is selected as the tree with the
greatest log probability.
(Alternative: you might try using multiple methods and seeing
if you get a consensus, but it is more likely you'll get a different
tree for each method and each
+7 for first (parsimony or ML);
+3 for the second
+5 for other potentially valid method (method comparison)
- 8. What is an outgroup? Suggest a species that would be an
appropriate outgroup for a tree linking humans, cows, raccoons,
dogs, and elephants. [10 pts]
An outgroup is a taxon (or taxa)
that is definitively unrelated to a group of related taxa. In
phylogenetic analysis, the outgroup determines the root of a
tree. If one were comparing humans, cows, raccoons, dogs, and
elephants which are all mammals, an appropriate outgroup would
be a non-mammalian vertebrate species such as a reptile.
It is important to select an outgroup that can unequivocably
placed outside of the clustered taxa, but not so distant as to
be uninformative.
+6 pts to describe outgroup
+2 pts to pick outgroup
+2 pts to use of outgroup to root a treet
- 9. What is an outgroup used for in phylogenetic analysis?
[5 pts]
- An outgroup, a species more distant
from each of the other species than any of the other species
is to any of the other other species, is used to provide a basis
for conversion of an unrooted tree into a rooted tree.
-
- 10. What is meant by "once a gap, always a gap"?
[5 pts]
When gaps are generated in the
dynamic alignment between an initial pair of sequences, and the
penalty is only paid once and the gap kept in all subsequent
alignments. This can lead to problems if the gap is assigned
incorrectly early on because it can not be "fixed"
later.
- 11. What are the main differences between the Fitch-Margoliash
method, the neighbor joining method, and the UPGMA method? [5 pts]
UPGMA/Fitch-Margoliash and neighbor-joining
are both tree-building algorithms using distance matrices. UPGMA/Fitch-Margoliash
builds a tree by selecting the two closest (least distant) sequences
to create a pair then adding the next nearest sequence one at
a time. Neighbor-joining does a pairwise comparison of each sequence
and selects the pair that minimizes the distance among the remaining
taxa.
- 12. Bootstrapping is a resampling method. What is it that
is resampled and how is this done? [10
pts]
In bootstrapping, different
columns in a multiple sequence alignment are resampled. This
can be done in a variety of ways, but they all involve substituting
some columns with other columns. Methods vary as to whether some
columns are discarded or not.
- 13. Why does one need to "correct" distance used
in making trees? [5 pts]
The major reason is to correct
for different clocks or evolutionary rates used in different
branches or at different times. A clock also usually assumes
neutral mutations; this clearly is not correct - selective pressure
related to function exists for many residues.
- 14. Describe two problems with parsimony approaches to constructing
trees. [10 pts]
- Many problems are possible here, including:
- 1) can not get branch lengths from
parsimony
- 2) some positions support one tree,
some support another.
- 3) not all sites are "informative",
ie they don't permit discrimination between two different tree
topologies.
- 4) One must have at least two characteris,
each occurring at least twice in the tree, to deduce the tree.
This is sometimes difficult, particularly with the low alphabet
found in DNA.
- 5) If there were multiple mutations,
parsimony has a particularly hard time.
-
- 15. How does the "Occams razor" principle relate
to parsimony trees. [5 pts]
- The tree that would require the fewest
inferred mutations is considered to be the "correct"
one: minimum evolution principal.
-
- 16. What kind of tree would be best for classifying enzymes
according to function. Why? [5 pts]
- Here one wants the maximum and best
information for the regions of the multiple sequence alignment
that have to do with enzyme function. These are likely to be
the slowest changing regions, ie most highly conserved regions,
in the multiple sequence alignment.
-
-
-
-
-
-