Exercise 7
Trees and Multiple Alignments


This exercise focuses on the interelated topics of phylogenetic trees and multiple alignments. The main topics addressed in this exercise are:



BIMM 140: | Main | 140_Info | Syllabus | Lectures | Exams | DNASYSTEM | CMS MBR |
BIMM 141: | Main | 141_Info | Syllabus | Exercises | DNASYSTEM | CMS MBR |



If you have problems or questions, send email to Michael Gribskov, Doug Smith, or Hiren Patel


Exercise 7 - Outline

 

  1. UPGMA based progressive multiple alignment using GCG PILEUP
    1. Find a suitable family of sequences
    2. Align your sequences using PILEUP
    3. Repeat using different gap penalties
    4. Generate UPGMA and neighbor joining trees
  2. Neighbor joining based progressive alignment with CLUSTALW
    1. Find out about the CLUSTAL W program
    2. Align your sequences using CLUSTAL W
    3. Examine the alignment and manually correct it.
    4. Calculate a tree with gap regions omitted and with bootstrapping.
    5. Write out your final tree in Phylip (.ph) and GCG (.msf)(alignment) formats for use in the next two sections of this exercise.}
  3. Parsimony trees using the GCG version of PAUP
    1. Construct a parsimony tree using the GCG PAUPSEARCH program. Make a plot of this tree using the PAUPDSPLAY program.
    2. Construct a tree using the "Heuristic tree search" and "Parsimony" options.
    3. Perform a bootstrap analysis of the parsimony tree.
    4. Perform a bootstrap analysis using the "Bootstrap analysis using neighbor-joining distance" option.
  4. Parsimony and maximumm likelihood trees using Phylip
    1. Familiarize yourself with the Phylip PROTPARS program and construct a parsimony tree.
    2. Familiarize yourself with the Phylip PROML program and contruct a maximum likelihood tree.
  5. Questions

 

Specific Tasks to Perform in Exercise 7:

{A. UPGMA based progressive multiple alignment using GCG PILEUP.}

Although UPGMA is a venerable technique, it is still popular today. The GCG program PILEUP implements a progressive alignment based on an initial UPGMA tree. We have to jump through a few hoops to get what we want. Read the page about PILEUP in the GCG users's guide to get some idea of the limitations of this program

{1. Find a suitable family of sequences.}

A suitable group of sequences will be eight to fifteen protein sequences that belong to a homologous family. The sequences should span a considerable evolutionary range (i.e. should be less than 80-90% identical). E-values of 10-40 to 10-50 are a good range for most of the sequences. It will be helpful in interpreting your results if you know the approximate relationships between the taxa represented by your sequences, but this is not imperative. Such a group can be assembled in many ways. You might,for instance, use entrez, or the "taxonomy reports" feature of blast to lacate similar sequences from a variety of species. Alternatively, one might select a group SWISS-PROT entries by searching for sequences that share the prefix part of an entry name, i.e., a group of asparaginases that all begin "ASPG_". You will use this group of sequences throughout this exercise.

{2. Align your sequences using PILEUP.}

Make this first run using default conditions. The multiple sequence alignment produced by pileup will be named something.msf .

{3. Repeat using different gap penalties.}

As you know, alignments depend on the gap penalties used in making them. This will obviouly have heavy consequences for progressive alignments. Try an alignment with very high (100,100) or very low (0,0) penalties to get an idea of the most extreme effects one could expect to see.

{4. Generate UPGMA and neighbor joining trees from your alignment}

This requires several steps in the GCG package. First, run the program DISTANCES to calculate a distance matrix from your PILEUP alignment. DISTANCES gives several options for correcting distances. Try both corrected and uncorrected distances. Next, use the GROWTREE program to generate the tree from the distance matrix. You will note that you have the option to display the tree as a "phylogram" or a "cladogram". The phylogram will have uneven branch lengths while the cladogram will have even lengths. I suggest using the "phylogram".



BIMM 140: | Main | 140_Info | Syllabus | Lectures | Exams | DNASYSTEM | CMS MBR |
BIMM 141: | Main | 141_Info | Syllabus | Exercises | DNASYSTEM | CMS MBR |



{B. Neighbor joining based progressive alignment with CLUSTALW.}

Neighbor-joining is a more recent technique than UPGMA and is currently very popular, largely due to its implementation in the CLUSTALW and CLUSTALX programs. CLUSTALW is available locally by typing

	/home/solaris/nsci/bi141s/clustalw

{1. Find out about the CLUSTAL W program.}

You should have read article 10 in the readings. This includes a lot of details about how the program works. Most of the parameters should be self explanatory, or will be covered in a demonstration in class.

{2. Align your sequences using CLUSTAL W, using both default parameters and the "&Toggle Slow/Fast pairwise alignments " option. Report these trees in your notebook and compare them with the ones generated in A4 above. }

CLUSTALW can accept either FASTA or GCG formated sequences as input. First generate the alignment using the default parameters. Then use the additional option "-quicktree" to generate the alignment using a fast ktup matching to generate the distances of the guide tree.

{3. Examine the alignment and manually correct poorly aligned regions, remove very large gaps, etc. What changes did you make to your alignment and why?}

{4. Load your modified alignment back into CLUSTALW and calculate a tree with gap regions omitted and with 100 trials of bootstrapping. Report this tree in your notebook. How does it differ from previous trees? Is it reliable?

{5. Write out your final tree in Phylip (.ph) and GCG (.msf) (alignment) formats for use in the next two sections of this exercise.}



BIMM 140: | Main | 140_Info | Syllabus | Lectures | Exams | DNASYSTEM | CMS MBR |
BIMM 141: | Main | 141_Info | Syllabus | Exercises | DNASYSTEM | CMS MBR |



{C. Parsimony trees using the GCG version of PAUP.}

{1. Read the CLUSTAL alignment into the GCG PAUPSEARCH program. Construct a tree using the "Exhaustive tree search" and "Parsimony" options. Make a plot of this tree using the PAUPDSPLAY program. } Note that you must use the alignment.msf{*} syntax to specify your GCG format alignment.

{2. Construct a tree using the "Heuristic tree search" and "Parsimony" options. How does this tree compare to the tree in C1?}

{3. Perform a bootstrap analysis of the parsimony tree using the "Bootstrap analysis using branch-and-bound search" and "Parsimony" options. Based on the partition analysis, what are the most likely alternatives to your tree? }

{4. Perform a bootstrap analysis using the "Bootstrap analysis using neighbor-joining distance" option and compare this to the CLUSTALW analysis.}



BIMM 140: | Main | 140_Info | Syllabus | Lectures | Exams | DNASYSTEM | CMS MBR |
BIMM 141: | Main | 141_Info | Syllabus | Exercises | DNASYSTEM | CMS MBR |



{D. Parsimony and maximum likelihood trees using Phylip PROTML.}

{1. Familiarize yourself with the Phylip PROTPARS program by reading the document in /software/nonrdist/phylip-3.6/doc/protpars.html. Run PROTPARS (/home/solaris/nsci/bi141s/protpars) and compare the tree to those of PAUPSEARCH. } Note that you must use the .ph alignment file from B5.

{2. Familiarize yourself with the Phylip PROML program by reading the document in /software/nonrdist/phylip-3.6/doc/proml.html. Run PROML (/home/solaris/nsci/bi141s/proml) and compare to the other trees you have constructed.}



BIMM 140: | Main | 140_Info | Syllabus | Lectures | Exams | DNASYSTEM | CMS MBR |
BIMM 141: | Main | 141_Info | Syllabus | Exercises | DNASYSTEM | CMS MBR |



{E. Questions.}

  1. Explain what is meant by progressive alignment.
  2. Why are progressive alignments the only practical multiple alignment techniques?
  3. Explain the basic steps used by PILEUP in making a progressive alignment.
  4. What type of tree is used as a guide tree by PILEUP?
  5. What are the main enhancements in CLUSTAL with respect to PILEUP?
  6. What are the two most serious drawbacks common to all progressive alignments?
  7. Describe two ways one can evaluate the correctness of a phylogenetic tree based on sequence data?
  8. What is an outgroup? Suggest a species that would be an appropriate outgroup for a tree linking humans, cows, raccoons, dogs, and elephants.
  9. What is an outgroup used for in phylogenetic analysis?
  10. What is meant by "once a gap, always a gap"?
  11. What are the main differences between the Fitch-Margoliash method, the neighbor joining method, and the UPGMA method?
  12. Bootstrapping is a resmapling method. What is it that is resampled and how is this done?
  13. Why does one need to "correct" distance used in making trees?
  14. Describe two problems with parsimony approaches to constructing trees.
  15. How does the "Occams razor" principle relate to parsimony trees.
  16. What kind of tree would be best for classifying enzymes according to function. Why?


BIMM 140: | Main | 140_Info | Syllabus | Lectures | Exams | DNASYSTEM | CMS MBR |
BIMM 141: | Main | 141_Info | Syllabus | Exercises | DNASYSTEM | CMS MBR |



Latest modification: 13 May 2001

If you have problems or questions, send email to Michael Gribskov, Doug Smith, or Hiren Patel