This exercise focuses on the interelated topics of phylogenetic trees and multiple alignments. The main topics addressed in this exercise are:
Although UPGMA is a venerable technique, it is still popular today. The GCG program PILEUP implements a progressive alignment based on an initial UPGMA tree. We have to jump through a few hoops to get what we want. Read the page about PILEUP in the GCG users's guide to get some idea of the limitations of this program
{1. Find a suitable family of sequences.}
A suitable group of sequences will be eight to fifteen protein sequences that belong to a homologous family. The sequences should span a considerable evolutionary range (i.e. should be less than 80-90% identical). E-values of 10-40 to 10-50 are a good range for most of the sequences. It will be helpful in interpreting your results if you know the approximate relationships between the taxa represented by your sequences, but this is not imperative. Such a group can be assembled in many ways. You might,for instance, use entrez, or the "taxonomy reports" feature of blast to lacate similar sequences from a variety of species. Alternatively, one might select a group SWISS-PROT entries by searching for sequences that share the prefix part of an entry name, i.e., a group of asparaginases that all begin "ASPG_". You will use this group of sequences throughout this exercise.
{2. Align your sequences using PILEUP.}
Make this first run using default conditions. The multiple sequence alignment produced by pileup will be named something.msf .
{3. Repeat using different gap penalties.}
As you know, alignments depend on the gap penalties used in making them. This will obviouly have heavy consequences for progressive alignments. Try an alignment with very high (100,100) or very low (0,0) penalties to get an idea of the most extreme effects one could expect to see.
{4. Generate UPGMA and neighbor joining trees from your alignment}
This requires several steps in the GCG package. First, run the program DISTANCES to calculate a distance matrix from your PILEUP alignment. DISTANCES gives several options for correcting distances. Try both corrected and uncorrected distances. Next, use the GROWTREE program to generate the tree from the distance matrix. You will note that you have the option to display the tree as a "phylogram" or a "cladogram". The phylogram will have uneven branch lengths while the cladogram will have even lengths. I suggest using the "phylogram".
Neighbor-joining is a more recent technique than UPGMA and is currently very popular, largely due to its implementation in the CLUSTALW and CLUSTALX programs. CLUSTALW is available locally by typing
/home/solaris/nsci/bi141s/clustalw
{1. Find out about the CLUSTAL W program.}
You should have read article 10 in the readings. This includes a lot of details about how the program works. Most of the parameters should be self explanatory, or will be covered in a demonstration in class.
{2. Align your sequences using CLUSTAL W, using both default parameters and the "&Toggle Slow/Fast pairwise alignments " option. Report these trees in your notebook and compare them with the ones generated in A4 above. }
CLUSTALW can accept either FASTA or GCG formated sequences as input. First generate the alignment using the default parameters. Then use the additional option "-quicktree" to generate the alignment using a fast ktup matching to generate the distances of the guide tree.
{3. Examine the alignment and manually correct poorly aligned regions, remove very large gaps, etc. What changes did you make to your alignment and why?}
{4. Load your modified alignment back into CLUSTALW and calculate a tree with gap regions omitted and with 100 trials of bootstrapping. Report this tree in your notebook. How does it differ from previous trees? Is it reliable?
{5. Write out your final tree in Phylip (.ph) and GCG (.msf) (alignment) formats for use in the next two sections of this exercise.}
{1. Read the CLUSTAL alignment into the GCG PAUPSEARCH program. Construct a
tree using the "Exhaustive tree search" and "Parsimony" options. Make a plot of this tree using the
PAUPDSPLAY program. } Note that you must use the alignment.msf{*} syntax to specify your GCG format alignment.
{2. Construct a tree using the "Heuristic tree search" and "Parsimony" options. How does this tree compare to the tree in C1?}
{3. Perform a bootstrap analysis of the parsimony tree using the
"Bootstrap analysis using branch-and-bound search" and "Parsimony" options. Based on the partition
analysis, what are the most likely alternatives to your tree? }
{4. Perform a bootstrap analysis using the "Bootstrap analysis using neighbor-joining
distance" option and compare this to the CLUSTALW analysis.}
{1. Familiarize yourself with the Phylip PROTPARS program by reading the document in
/software/nonrdist/phylip-3.6/doc/protpars.html. Run PROTPARS
(/home/solaris/nsci/bi141s/protpars) and compare the
tree to those of PAUPSEARCH. } Note that you must use the .ph alignment file from B5.
{2. Familiarize yourself with the Phylip PROML program by reading the document in
/software/nonrdist/phylip-3.6/doc/proml.html. Run PROML
(/home/solaris/nsci/bi141s/proml) and compare to the other trees you have constructed.}
Latest modification: 13 May 2001{C. Parsimony trees using the GCG version of PAUP.}
BIMM 140: | Main
| 140_Info
| Syllabus
| Lectures
| Exams
| DNASYSTEM
| CMS MBR |
BIMM 141: | Main
| 141_Info
| Syllabus
| Exercises
| DNASYSTEM
| CMS MBR |
{D. Parsimony and maximum likelihood trees using Phylip PROTML.}
BIMM 140: | Main
| 140_Info
| Syllabus
| Lectures
| Exams
| DNASYSTEM
| CMS MBR |
BIMM 141: | Main
| 141_Info
| Syllabus
| Exercises
| DNASYSTEM
| CMS MBR |
{E. Questions.}
BIMM 140: | Main
| 140_Info
| Syllabus
| Lectures
| Exams
| DNASYSTEM
| CMS MBR |
BIMM 141: | Main
| 141_Info
| Syllabus
| Exercises
| DNASYSTEM
| CMS MBR |