CS 177 Phylogenetics II Tree building methods: some

CS 177 Phylogenetics II Tree building methods: some

CS 177 Phylogenetics II Tree building methods: some examples Assessing phylogenetic data Popular phylogenetic software packages Tree building methods: some examples Assessing phylogenetic data Popular phylogenetic packages Phylogenetics II Disclaimers Before describing any theoretical or practical aspects of phylogenetics, it is necessary to give some disclaimers. This area of computational biology is an intellectual minefield! Neither the theory nor the practical applications of any algorithms are universally accepted throughout the scientific community. The application of different software packages to a data set is very likely to give different answers; minor changes to a data set are also likely to profoundly change the result. Tree building methods: some examples Assessing phylogenetic data Popular phylogenetic packages Phylogenetics II Are there Correct trees?? helix sheet Despite all of all problems, it is actually quite simple to use computer programs

calculate phylogenetic trees for data sets Tree building methods: some examples Assessing phylogenetic data Popular phylogenetic packages Provided the data are clean, outgroups are correctly specified, appropriate algorithms are chosen, no assumptions are violated, etc., can the true, correct tree be found and proven to be scientifically valid? Unfortunately, it is impossible to ever conclusively state what is the "true" tree for a group of sequences (or a group of organisms); taxonomy is constantly under revision as new data is gathered Phenetics versus cladistics Phenetic methods construct trees (phenograms) by considering the current states of characters without regard to the evolutionary history that brought the species to their current phenotypes; phenograms are based on overall similarity Tree building methods: some examples Assessing phylogenetic data Popular phylogenetic packages Cladistic methods construct trees (cladograms) rely on assumptions about ancestral relationships as well as on current data; cladograms are based on character evolution (e.g. shared derived characters) Tree building methods Data type: genetic distance / character-state Computational method: optimality criterion/clustering algorithmen CO M PUTATIO NAL M ETHO D

Assessing phylogenetic data Popular phylogenetic packages C h a ra c te rs D is ta n c e s Tree building methods: some examples DATA TYPE O p t im a lit y c r it e r io n C lu s t e r in g a lg o r it h m PAR S IM O N Y M IN IM U M EV O LU TIO N U PG M A N EIG H B O R- JO IN IN G LEAS T S Q U ARES FITC H & M A R G O LIAS H Tree building (distance based) UPGMA - The simplest of the distance methods is the UPGMA (Unweighted Pair Group Method using Arithmetic averages) - Many multiple alignment programs such as PILEUP use a variant of UPGMA to create a dendrogram of DNA sequences which is then used to guide the multiple alignment algorithm Tree building methods: some examples Assessing

phylogenetic data Popular phylogenetic packages UPGMA A Tree building methods: some examples Assessing phylogenetic data Popular phylogenetic packages B C D E F A - B 63 - C 94 79

- D 111 96 47 - E 67 16 83 100 - F 23 58 89 106 62 - G 107

92 43 20 96 102 G - UPGMA A D Tree building methods: some examples Assessing phylogenetic data Popular phylogenetic packages B C D E F A - B

63 - C 94 79 - D 111 96 47 - E 67 16 83 100 - F 23 58 89

106 62 - G 107 92 43 20 96 102 G G - UPGMA A D Tree building methods: some examples Assessing phylogenetic data Popular phylogenetic packages B C

E F A - B 63 - C 94 79 - E 67 16 83 - F 23 58 89 62

- DG 94 84 35 88 94 G C DG - UPGMA A D Tree building methods: some examples Assessing phylogenetic data Popular phylogenetic packages B E F A

- B 63 - E 67 16 - F 23 58 62 - CDG 61 64 61 74 C A G

F CDG - UPGMA AF D Tree building methods: some examples Assessing phylogenetic data Popular phylogenetic packages B E CDG AF - B 98 - E 106 16 -

CDG 112 64 61 G C A F - B E UPGMA AF BE AF - BE 188 - CDG 112

108 CDG B D G C A F B E D G E Tree building methods: some examples Assessing phylogenetic data Popular phylogenetic packages Root C A F

Maximum Parsimony (MP) outgroup a b c b A I c A ACGTGTTG ACGTGTTG ACGAGAAG ACGAGCTG A /T T outgroup b A d T c A A /T T outgroup d T Assessing phylogenetic data Popular phylogenetic packages III

- Parsimony involves evaluating all possible trees for each vertical column of sequence character (nucleotide position) - only informative sites are considered - each tree is given a score based on the number of evolutionary changes that are needed to explain the observed data - finally, those trees that produce the smallest number of changes (shortest trees) overall for all sequence positions are identified T/A Tree building methods: some examples ACGTGTTG ACGTGTTG ACGAGAAG ACGAGCTG d T T/A II outgroup a b c c A b A A /T T/A T outgroup Maximum Likelihood (ML)

outgroup a b c b A I c A ACGTGTTG ACGTGTTG ACGAGAAG ACGAGCTG outgroup a b c ACGTGTTG ACGTGTTG ACGAGAAG ACGAGCTG d T A /T T/A T outgroup b A II d T c

A A /T T/A Tree building methods: some examples T outgroup d T Assessing phylogenetic data Popular phylogenetic packages III c A b A A /T T/A T outgroup - Maximum Likelihood uses probability calculations based on a specific model of sequence evolution to find a tree that best accounts for the variation in a set of sequences - all possible trees for each nucleotide position are considered - the less mutations needed to fit a tree to the data, the more likely the tree - ML resembles MP in that the tree with the least number of changes will be most likely - however, ML evaluates trees using explicit evolutionary models - thus, the method can be used to explore relationships among

more diverse taxa Computational methods for finding optimal trees Assessing phylogenetic data Popular phylogenetic packages unrooted (2n-5)!/(2n-3(n-3)!) 2 1 3 1 4 3 5 15 6 105 7 954 8 10,395 9

135,135 10 2,027,025 ... Tree building methods: some examples Taxa (n) ... Possible evolutionary trees 30 3.58 x 1036 Computational methods for finding optimal trees Exact algorithms - Guarantee to find the optimal or best tree for the method of choice - Two types used in tree building: Exhaustive search: Evaluates all possible unrooted trees, choosing the one with the best score for the method Branch-and-bound search: Eliminates part of the tree that only contain suboptimal solutions Heuristic algorithms Tree building methods: some examples - Approximate or quick-and-dirty methods that attempt to find the optimal tree for the method of choice, but cannot guarantee to do so - Often operate by hill-climbing methods Assessing phylogenetic data

Popular phylogenetic packages Heuristic algorithms Search for global minimum Search for global maximum GLOBAL MAXIMUM local maximum Tree building methods: some examples Assessing phylogenetic data Popular phylogenetic packages local minimum GLOBAL MINIMUM Heuristic search algorithms are input order dependent and can get stuck in local minima or maxima GLOBAL MAXIMUM GLOBAL MINIMUM

Rerunning heuristic searches using different input orders of taxa can help find global minima or maxima From NHGRI lecture, C.-B. Stewart Assessing Phylogenetic Data Most data includes potentially misleading evidence of relationships One should not only construct phylogenetic hypotheses but should also assess what confidence can be placed in these hypotheses Questions: How much support is there for a particular clade? Is there signal in the data? Tree building methods: some examples Assessing phylogenetic data Popular phylogenetic packages Assessing Phylogenetic Data How much support is there for a particular clade? Bootstrapping/Jack-knifing: Lots of randomized data sets are produced by sampling the real data with replacement (or in jackknifing, by removing some random proportion of the data); Frequencies of occurrence of groups are a measure of support for those groups 16 59 26 71 16

Tree building methods: some examples Assessing phylogenetic data Popular phylogenetic packages 21 O c h ro m o n a s S y m b io d in iu m P r o ro c e n tru m Loxodes S p ir o s t o m u m u m T e tra h y m e n a E u p lo te s T r a c h e lo r a p h is G r u b e r ia 59 71 O c h ro m o n a s S y m b io d in iu m P r o ro c e n tru m Loxodes S p ir o s t o m u m u m T e tra h y m e n a E u p lo te s T r a c h e lo r a p h is G r u b e r ia Problems: - Bootstrap proportions arent easily interpretable - no indication for how good the data are but simply for how well the tree fits the data Assessing Phylogenetic Data Is there signal in the data? Possible approach: Random Permutations

- Random permutation destroys any correlation among characters to that expected by chance alone - It preserves number of taxa, characters and character states in each character (and the theoretical maximum and minimum tree lengths) taxa characters a b c d Tree building methods: some examples 2 T G C C 3 G G G G 4 T T A A 5 G G C C 6

T T A A 7 T T A A 8 G G C T Original structured data with strong correlations among characters characters taxa Assessing phylogenetic data Popular phylogenetic packages 1 A A A A a b c d 1 A

A A A 2 C G T C 3 G G G G 4 A T A T 5 G C C G 6 T A A T 7 A T T A 8 G T

C G Randomly permuted data with any correlation among characters due to chance Assessing Phylogenetic Data taxa characters a b c d 1 A A A A 2 T G C C 3 G G G G 4 T T A A 5 G

G C C 6 T T A A 7 T T A A 8 G G C T Original structured data with strong correlations among characters taxa characters a b c d 1 A A A A 2 C G

T C 3 G G G G 4 A T A T 5 G C C G 6 T A A T 7 A T T A 8 G T C G Randomly permuted data with any correlation among characters due to chance Matrix Randomization Tests

Tree building methods: some examples Assessing phylogenetic data Popular phylogenetic packages Compare some measure of data quality/hierarchical structure for the real and many randomly permuted data sets This allows us to define a test statistic for the null hypothesis that the real data are not better structured than randomly permuted and phylogenetically uninformative data PTP (permutation tail probability) test P a s s te s t F a il te s t F re q u e n c y r e je c t n u ll h y p o th e s is 9 5 % c u to ff M e a s u r e o f d a t a q u a lit y ( e .g . t r e e le n g t h , M L ...) good Tree building methods: some examples bad Null Hypothesis: The length of the shortest tree is what you would see given random data Assessing phylogenetic data

How it works: Reject the null if the real data has shorter tree (the real data is more internally consistent than random data) Popular phylogenetic packages Comments: Even a little bit of signal can lead you to reject the null; does not mean phylogenetic signal Popular phylogenetic software packages Review available at: http://evolution.genetics.washington.edu/phylip/software.html Tree building methods: some examples Assessing phylogenetic data Popular phylogenetic packages Popular phylogenetic software packages PHYLIP version 3.6, Joe Felsenstein It is available free, from its Web site, in C source code, or as executables for pre-386 DOS, 386/486/Pentium DOS, Windows 3.1, Windows95/98/NT, 68k Macintosh, or PowerMac. The C source code is easily compiled on Unix systems, and VMS compilation support is also available in the package. It includes programs to carry out parsimony, distance matrix methods, maximum likelihood, and other methods on a variety of types of data, including DNA and RNA sequences, protein sequences, restriction sites, 0/1 discrete characters data, gene frequencies, continuous characters and distance matrices. It is the most widely-distributed phylogeny package, with over 7,000 registered users, some of them satisfied. It competes with PAUP* to be the program responsible for the most published trees. It has been distributed since October, 1980. PHYLIP is distributed at the PHYLIP web site at http://evolution.genetics.washington.edu, or by anonymous ftp from evolution.genetics.washington.edu in directory pub/phylip. Tree building methods: some examples

Assessing phylogenetic data Popular phylogenetic packages All information from: http://evolution.genetics.washington.edu/phylip/software.html Popular phylogenetic software packages PHYLIP version 3.6, Joe Felsenstein Tree building methods: some examples Assessing phylogenetic data Popular phylogenetic packages Popular phylogenetic software packages PAUP* (Phylogenetic Analysis Using Parsimony and other Methods) version 4.0beta, David Swofford PAUP* has been released as a provisional version by Sinauer Associates, of Sunderland, Massachusetts. It has Macintosh, PowerMac, Windows, and Unix/OpenVMS versions. PAUP* is the most sophisticated parsimony program, with many options and close compatibility with MacClade. It has become much broader with the inclusion of more methods. It includes parsimony, distance matrix, invariants, and maximum likelihood methods and many indices and statistical tests. It is described in a web page at http://www.sinauer.com/Titles/frswofford.htm, and in more detail at its web site at the LMS at http://www.lms.si.edu/PAUP/about.html. The price is $100 US for the Macintosh and PowerMac executable versions, $85 for the Windows executable version, and $150 for the Unix source code version, plus $20 for shipment. Tree building methods: some examples Assessing phylogenetic data Popular

phylogenetic packages All information from: http://evolution.genetics.washington.edu/phylip/software.html Popular phylogenetic software packages PAUP* (Phylogenetic Analysis Using Parsimony and other Methods) version 4.0beta, David Swofford Tree building methods: some examples Assessing phylogenetic data Popular phylogenetic packages Popular phylogenetic software packages MrBayes: Bayesian Inference of Phylogeny MrBayes is a program for Bayesian inference of phylogeny using Markov chain Monte Carlo methods. Avaialble for Mac, PC, and Unix. Tree building methods: some examples Assessing phylogenetic data Popular phylogenetic packages Popular phylogenetic software packages MacClade, Wayne Maddison and David Maddison MacClade is described on its Web page, at http://phylogeny.arizona.edu/macclade/ macclade.html. A demonstration version of MacClade 3 is also available there. MacClade

enables you to use the mouse-window interface to specify and rearrange phylogenies by hand, and watch the number of character steps and the distribution of states of a given character on the tree change as you do so. Available for Macintosh only. All distribution is by Sinauer Associates, 23 Plumtree Road, Sunderland, Massachusetts 01375-0407, USA. A disk with program, help file, and example data files, plus book (which has about 100 pages of intro to phylogenetic theory, and 250 pages of program instructions), is $100 U.S. ($40 for the book alone). Tree building methods: some examples Assessing phylogenetic data Popular phylogenetic packages All information from: http://evolution.genetics.washington.edu/phylip/software.html Popular phylogenetic software packages A B M . c e p h a lo te s M . p h a e o c e p h a lu s M . p a n a m e n s is M . p h a e o c e p h a lu s MacClade, Wayne Maddison and David Maddison M . fero x M . b a rb iro s tris M . t u b e r c u life r (E c u a d o r ) M . t u b e r c u lif e r (A r g e n tin a ) p h a e o n o t u s - p e lz e ln i p h a e o n o t u s - p e lz e ln i fe r o c io r p e lz e ln i p h a e o n o t u s - p e lz e ln i

s w a in s o n i- p e lz e ln i p h a e o n o t u s - p e lz e ln i p e lz e ln i p e lz e ln i p h a e o n o t u s - p e lz e ln i p h a e o n o t u s - p e lz e ln i s w a in s o n i- fe r o c io r Tree building methods: some examples p e lz e ln i s w a in s o n i- p e lz e ln i p h a e o n o t u s - p e lz e ln i s w a in s o n i Assessing phylogenetic data s w a in s o n i s w a in s o n i M . t y r a n n u lu s Popular phylogenetic packages R h y t ip t e r n a im m u n d a T y r a n n u s c a u d ifa s c ia t u s B r e e d in g r a n g e : N o rth e rn S o u th A m e ric a S o u t h e r n S o u t h A m e r ic a Popular phylogenetic software packages RASA, version 2.5, James Lyons-Weiler Software for Macintoshes that will perform "Relative Apparent Synapomorphy Analysis", a test for the presence of phylogenetic signal in any type of discrete character data matrix (morphological or molecular). The RASA program carries out the test and plots the results. RASA is menu-driven. The test compares the observed and null rates of increase in

cladistic similarity among pairs of taxa predicted by an increase in the phenetic similarity among taxon pairs. The programs are available as Macintosh executables from their web page at http://bio.uml.edu/LW/RASA.html. Tree building methods: some examples Assessing phylogenetic data Popular phylogenetic packages All information from: http://evolution.genetics.washington.edu/phylip/software.html Popular phylogenetic software packages TCS version 1.06, Mark Clement and David Posada A program for estimating gene genealogies within a population. It does so by using the method introduced in the paper: Templeton, A. R., K. A. Crandall and C. F. Sing. 1992. A cladistic analysis of phenotypic associations with haplotypes inferred from restriction endonuclease mapping and DNA sequence data. III. Cladogram estimation. Genetics 132: 619633. This is a method that connects existing haplotypes in a minimum spanning tree which is essentially a parsimony method. It can also infer networks with loops in them. TCS is written in Java and has a graphic user interface for the display of the resulting networks. It may be run on any system that has the Java runtime environment. The program is described in the paper: Clement M., D. Posada, and K. Crandall. 2000. TCS: a computer program to estimate gene genealogies. Molecular Ecology 9: 1657-1660. TCS is available as Java executables, with documentation, at its web site at: http://bioag.byu.edu/zoology/crandall_lab/tcs.htm. Tree building methods: some examples Assessing phylogenetic data Popular phylogenetic packages All information from: http://evolution.genetics.washington.edu/phylip/software.html Popular phylogenetic software packages

D2 F9 D3, 5 D4 A10 * TCS version 1.06, Mark Clement and David Posada E4 F5, 7 B2 A5 D6 F2 E3 G 7, 9 A9 C2 B6 B3 E1, 2, 5 F1, 3, 4, 6, 8, 10 G 1, 2, 3, 4, 5, 6, 8, 10 Tree building methods: some examples A2 D1

B7 Assessing phylogenetic data Popular phylogenetic packages * A7 * C7 A6, 8 B5, 8, 9 C 1, 3, 4, 5, 6, 8, 9 B4 * A3 A4 B1 A1 Popular phylogenetic software packages BioEdit, version 4.8.4., Tom Hall This is a sequence editor with many kinds of general molecular biology functions available (alignment, BLAST searches, plasmid drawing, restriction mapping, sequence machine trace viewing, etc.). For our purposes the feature worth mentioning is that it comes with a number of existing phylogeny programs which can be automatically run from within BioEdit. These are: TreeView, fastDNAml, and six DNA and protein programs from PHYLIP. BioEdit is available as Windows95/98/NT executables from its web site at http://www.mbio.ncsu.edu/RNaseP/info/programs/BIOEDIT/bioedit.html. Tree building methods:

some examples Assessing phylogenetic data Popular phylogenetic packages All information from: http://evolution.genetics.washington.edu/phylip/software.html Popular phylogenetic software packages TreeView, Rod Page A program for displaying trees on Apple Macs and Windows PCs. It can draw rooted and unrooted trees, display bootstrap values, and supports the native font and graphics file formats of both Macs and PCs. The program reads NEXUS, PHYLIP, and Hennig86 style tree files (including files produced by fastDNAml and CLUSTALW), and can save trees in the same formats so that it can convert trees among these formats. TreeView can read up to 100 trees with up to 500 taxa. The program is free, and can be obtained by World Wide Web from http://taxonomy.zoology.gla.ac.uk/rod/treeview.html. It comes in 68K Mac, PowerMac, and Windows 95/NT executable versions (and in a Windows 3.1 executable for version 1.4). There is also online help including an online manual. Tree building methods: some examples Assessing phylogenetic data Popular phylogenetic packages All information from: http://evolution.genetics.washington.edu/phylip/software.html Popular phylogenetic software packages TreeView, Rod Page Tree building methods: some examples Assessing

phylogenetic data Popular phylogenetic packages Popular phylogenetic software packages DnaSP version 3.53, Julio Rozas and Ricardo Rosas A software package for the analysis of nucleotide polymorphism from aligned DNA sequence data. DnaSP can estimate several measures of DNA sequence variation within and between populations (in noncoding, synonymous or nonsynonymous sites), as well as linkage disequilibrium, recombination, gene flow and gene conversion parameters. It can also carry out several tests of neutrality: Additionally, it can estimate the confidence intervals of some test-statistics by the coalescent. The results of the analyses are displayed on tabular and graphic form. For the purposes of this web site, the relevant features are the calculation of measures of population divergence, which include the Jukes-Cantor method which can be used as a distance in phylogeny reconstruction. It is distributed as a Windows95/98/NT executable from its web site at http://www.bio.ub.es/~julio/DnaSP.html. Tree building methods: some examples Assessing phylogenetic data Popular phylogenetic packages All information from: http://evolution.genetics.washington.edu/phylip/software.html Popular phylogenetic software packages DnaSP version 3.53, Julio Rozas and Ricardo Rosas Tree building methods: some examples Assessing phylogenetic data Popular phylogenetic packages

Popular phylogenetic software packages Arlequin version 2.0, Laurent Excoffier A program for population genetics analysis. It can perform many kinds of population genetic tasks including estimation of gene frequencies, testing of linkage disequilibrium, and analysis of diversity between populations. For the purposes of this list, the relevant feature is its ability to compute a variety of genetic distance measures including of Jukes and Cantor, the Kimura 2-parameter distance, and the Tamura-Nei distance, each of these with or without correction for gamma-distributed rates of evolution. It can also compute a Minimum Spanning Tree network. Arlequin has its interactive "front end" written in Java, and requires the Java Runtime Environment (which is available from the Arlequin site for those who do not already have it). The core routines are available as binaries for Windows95/98/NT/2000, for MacOS for the PowerPC processor, and for Linux for Intel-compatible x86 processors. The binaries, Java code, Java Runtime Environment, and a PDF documentation file are available at its web site at http://acasun1.unige.ch/arlequin/. Tree building methods: some examples Assessing phylogenetic data Popular phylogenetic packages All information from: http://evolution.genetics.washington.edu/phylip/software.html Popular phylogenetic software packages Arlequin version 2.0, Laurent Excoffier Tree building methods: some examples Assessing phylogenetic data Popular phylogenetic packages

Recently Viewed Presentations

  • Unit 2 - Earth's Structure

    Unit 2 - Earth's Structure

    The Earth System . The earth system involves a constant flow of matter through different parts - ex: the water cycle. Rock too, cycles through the Earth System - ex: new rock can form from molten material inside Earth called...
  • Strategic Management: Concepts and Cases Part II: Strategic

    Strategic Management: Concepts and Cases Part II: Strategic

    Competitive Dynamics: 3 Market Cycles (Cont'd) 2. Fast-Cycle Markets. Markets in which the firm's capabilities that contribute to competitive advantages are not shielded from imitation and where imitation is often rapid and inexpensive
  • Introductio n to Iambic Pentamete r Objective: Students

    Introductio n to Iambic Pentamete r Objective: Students

    Shakespeare would use Iambic Pentameter for nobles, trochee for the witches, and regular prose for the common man, switching back and forth at times. Thrice the brinded cat hath mew'd. Thrice and once, the hedge-pig whin'd. Harpiercries:—'tis time! 'tis time!...
  • Measuring Conformational Energy Differences Using Pulsed-jet ...

    Measuring Conformational Energy Differences Using Pulsed-jet ...

    measuring conformational energy differences using pulsed-jet microwave spectroscopy. cameron m funderburk, sydney a gaster, tiffany r taylor, gordon g brown
  • 4x7 7x9  8x4  8x6 12 x 5  9x4

    4x7 7x9 8x4 8x6 12 x 5 9x4

    Author: John Created Date: 08/19/2011 22:56:26 Title: PowerPoint Presentation Last modified by: John Tranter
  • Liability for failure to disclose and the challenge

    Liability for failure to disclose and the challenge

    Sidaway v Bethlem Royal Hospital (1985) per Lord Scarman The Broad view of Montgomery II '…it is a non sequitur to conclude that the question whether a risk of injury, or the availability of an alternative form of treatment, ought...
  • Good Luck! Exam 1 Review Phys 222 - Supplemental Instruction

    Good Luck! Exam 1 Review Phys 222 - Supplemental Instruction

    Good Luck!Exam 2 ReviewPhys 222 - Supplemental Instruction. SundaySession As Normal, Informal Q/a. The correct solution process is the right answer. ... Magnetic Force on a moving charge, by a field. Use standard RHR, cross product.
  • Nomenklatur organischer Verbindungen

    Nomenklatur organischer Verbindungen

    -O- Benennen Sie folgende Ether auf zweifache Art und Weise: Ethyloxyethen oder Ethylethenylether Phenyloxybenzol oder Diphenylether Erstellen Sie die Strukturformel von 2-Methyloxypropan: Aldehyde Ketone R-CHO R-CO-R funktionelle Gruppe: Aldehyd-Gruppe Keto-, Carbonyl- oder Oxo-Gruppe Benennung: Endung: -al Endung: -on Homologe Reihe...