Pharm 207/Bio 207 Home Page

Using Internet Resources in Molecular Biology - Lecture 6

Phylogenetic Trees

Lecturer: Ilya Shindyalov
Date & Time: 3pm-5pm 11/25

Table of Contents


Lecture Outline

  • Evolutionary change in sequences.
  • Rates of evolution, molecular clocks.
  • Molecular phylogeny: UPGMA, neighbor-joining methods.

  • | TOC & Lecture Outline | Introduction | Reading Materials | Lecture Goals and Assignment | Examples |


    Molecular evolution deals with two problems: (i) understanding evolution of macromolecules (DNA, RNA, proteins); (ii) reconstruction of evolutionary history of genes and organisms.

    The most common principle in reconstruction of phylogenies is the so called principle of maximum parsimony or minimum evolution which implies search for a tree with minimum mutations to explain differences observed between sequences.

    UPGMA = unweighted pair group method with arithmetic mean.

    UPGMA example for 4 taxa:

    (i) We start with a distance matrix with 4 OTUs (operational taxonomoc units) designated as A, B, C, D. In our case when we use information about protein sequences, each sequence (or what's represented by this sequence) will be one OTU. The distance matrix contains evolutionary distances for pairs of OTUs, i.e. measures how far those two sequences have diverged. Distance may be calculated, for instance, as the number of amino-acid differences between two sequences.

                               A      B      C
                           B  d(A,B) 
                           C  d(A,C) d(B,C)
                           D  d(A,D) d(B,D) d(C,D)
       d(X,Y) - is the evolutionary distance for two taxa X and Y.
    (ii) Let's assume d(A,B) has the smallest value. We join them in the following
                               AB      C              d(A,B)/2 
                           C  d(AB,C)                ___________ A  
                           D  d(AB,D) d(C,D)        |
                                                    |___________ B
             d(AB,C)=(d(A,C)+d(B,C))/2   d(AB,D)=(d(A,D)+d(B,D))/2
             AB - is a joint taxon of A and B.
    (iii) Let's assume d(AB,C) is the smallest distance now.
                                                     ___________ A  
                                              |     |___________ B
                                              |_________________ C
    (iv) The last step is to connect D to the tree.
                                                     ___________ A  
                                              |     |___________ B
                                        |     |_________________ C
                                        |_______________________ D              

    | TOC & Lecture Outline | Introduction | Reading Materials | Lecture Goals and Assignment | Examples |

    Reading Materials

    1. Fitch W.M. (1977) On the problem of discovering the most parsimonious tree. Amer. Natur., 111, 223-257.
    2. required Saitou N., Nei M. (1987) The neighbor-joining method: a new method for reconstructing phylogenetic trees. Mol. Biol. Evol., 4, 406-425.
    3. Czelusniak J. et al. (1990) Maximum parsimony approach to construction of evolutionary trees from aligned homologous sequences. Methods in Enzymology, 183, 601-615.
    4. required Li W.-H., Gouy M. (1990) Statistical test of molecular phylogenies. Methods in Enzymology, 183, 645-659.
    5. Sneath P.H., Sokal R.R. (1973) Numerical taxonomy. Freeman, San Francisco.
    6. Kimura M. (1983) The neutral theory of molecular evolution. Cambridge University Press.
    7. Nei M (1987) Molecular evolutionary genetics. Columbia University Press.
    8. Li W.-H., Graur D. (1991) Fundamentals of molecular evolution. Sinauer Associates, Inc.

    Software and information resources:

  • Phylogeny programs (at Washington University).
  • CMSMBR Molecular Evolution - Phylogeny.
  • DNASYSTEM Seq Analysis Page
  • Biology WorkBench.

  • | TOC & Lecture Outline | Introduction | Reading Materials | Lecture Goals and Assignment | Examples |

    Lecture Goals & Assignment

    To learn how to prepare data, construct and analyze phylogenetic trees.

  • If sequences are not yet aligned - align them using available multiple alignment algorithms.
  • Starting from a set of aligned sequences for a given family construct phylogenetic trees using available methods.
  • Compare with other trees obtained for a different protein family (e.g., globins).
  • Compare with known taxonomy of organisms.
  • Include final tree in your Web page for your pet protein.

  • | TOC & Lecture Outline | Introduction | Reading Materials | Lecture Goals and Assignment | Examples |


    General scenario:

    Sequences could be obtained from the NCBI server by searching using textual description in Entrez or representative sequence in BLAST search (use blastp program).

    Then sequences should be extracted in FASTA format and multiple alignment should be built using ClustalW program available at Biology WorkBench.

    Obtained multiple alignment should be then used to constract phylogenetic tree again using ClustalW program and then visualized using PHYLIP ptogram at Biology WorkBench.

    Phylogenetic tree then can be saved as gif files and included into your Web page.

    Step-by-step example:

    We use a set of a few cAMP-dependent protein kinases as a test.

    1. If you have set of aligned sequences go to step 5, if you have set of sequences which are not aligned go to step 4, if you have less than 4 sequences go to step 2.

    2. We start with a single sequence in FASTA format:

    >gi|2408086|gnl|PID|e1168586 camp dependent protein kinase regulatory chain

    3. Search for similar sequences using NCBI BLAST search:

    4. Calculate multiple alignment using ClustalW program at Biology WorkBench.

    5. Calculating the tree.

    6. Processing the image.

    | TOC & Lecture Outline | Introduction | Reading Materials | Lecture Goals and Assignment | Examples |