Pharm 207/Bio 207 Home Page
Using Internet Resources in Molecular Biology - Lecture 6
Date & Time: 3pm-5pm 11/25
Table of Contents
Pharm 207/Bio 207
Molecular evolution deals with two problems: (i) understanding evolution of macromolecules (DNA, RNA, proteins); (ii) reconstruction of evolutionary history of genes and organisms.
The most common principle in reconstruction of phylogenies is the so called principle of maximum parsimony or minimum evolution which implies search for a tree with minimum mutations to explain differences observed between sequences.
UPGMA = unweighted pair group method with arithmetic mean.
UPGMA example for 4 taxa:
(i) We start with a distance matrix with 4 OTUs (operational taxonomoc units) designated as A, B, C, D. In our case when we use information about protein sequences, each sequence (or what's represented by this sequence) will be one OTU. The distance matrix contains evolutionary distances for pairs of OTUs, i.e. measures how far those two sequences have diverged. Distance may be calculated, for instance, as the number of amino-acid differences between two sequences.
A B C B d(A,B) C d(A,C) d(B,C) D d(A,D) d(B,D) d(C,D) d(X,Y) - is the evolutionary distance for two taxa X and Y. (ii) Let's assume d(A,B) has the smallest value. We join them in the following way:
AB C d(A,B)/2 C d(AB,C) ___________ A D d(AB,D) d(C,D) | |___________ B d(AB,C)=(d(A,C)+d(B,C))/2 d(AB,D)=(d(A,D)+d(B,D))/2 AB - is a joint taxon of A and B. (iii) Let's assume d(AB,C) is the smallest distance now.
___________ A _____| | |___________ B | |_________________ C d(AB,C)/2 d(ABC,D)=(d(A,D)+d(B,D)+d(C,D))/3 (iv) The last step is to connect D to the tree.
___________ A _____| | |___________ B _____| | |_________________ C | |_______________________ D d(ABC,D)/2
To learn how to prepare data, construct and analyze phylogenetic trees.
Sequences could be obtained from the NCBI server by searching using textual description in Entrez or representative sequence in BLAST search (use blastp program).
Then sequences should be extracted in FASTA format and multiple alignment should be built using ClustalW program available at Biology WorkBench.
Obtained multiple alignment should be then used to constract phylogenetic tree again using ClustalW program and then visualized using PHYLIP ptogram at Biology WorkBench.
Phylogenetic tree then can be saved as gif files and included into your Web page.
We use a set of a few cAMP-dependent protein kinases as a test.
1. If you have set of aligned sequences go to step 5, if you have set of sequences which are not aligned go to step 4, if you have less than 4 sequences go to step 2.
2. We start with a single sequence in FASTA format:
>gi|2408086|gnl|PID|e1168586 camp dependent protein kinase regulatory chain MVAGPEAIGPDAKYVPELGGLKEMNVSYPQNYNLLRRQSVSTESMNPSAFALETKRTFPPKDPEDLKRLK RSVAGNFLFKNLDEEHYNEVLNAMTEKRIGEAGVAVIVQGAVGDYFYIVEQGEFDVYKRPELNITPEEVL SSGYGNYITTISPGEYFGELALMYNAPRAASVVSKTPNNVIYALDRTSFRRIVFENAYRQRMLYESLLEE VPILSSLDKYQRQKIADALQTVVYQAGSIVIRQGDIGNQFYLIEDGEAEVVKNGKGVVVTLTKGDYFGEL ALIHETVRNATVQAKTRLKLATFDKPTFNRLLGNAIDLMRNQPRARMGMDNEYGDQSLHRSPPSTKA
3. Search for similar sequences using NCBI BLAST search:
4. Calculate multiple alignment using ClustalW program at Biology WorkBench.
5. Calculating the tree.
6. Processing the image.