Accounting for Uncertainty in the Tree Topology Has Little Effect on the Decision-Theoretic Approach to Model Selection in Phylogeny Estimation
Open Access
- 17 November 2004
- journal article
- research article
- Published by Oxford University Press (OUP) in Molecular Biology and Evolution
- Vol. 22 (3) , 691-703
- https://doi.org/10.1093/molbev/msi050
Abstract
Currently available methods for model selection used in phylogenetic analysis are based on an initial fixed-tree topology. Once a model is picked based on this topology, a rigorous search of the tree space is run under that model to find the maximum-likelihood estimate of the tree (topology and branch lengths) and the maximum-likelihood estimates of the model parameters. In this paper, we propose two extensions to the decision-theoretic (DT) approach that relax the fixed-topology restriction. We also relax the fixed-topology restriction for the Bayesian information criterion (BIC) and the Akaike information criterion (AIC) methods. We compare the performance of the different methods (the relaxed, restricted, and the likelihood-ratio test [LRT]) using simulated data. This comparison is done by evaluating the relative complexity of the models resulting from each method and by comparing the performance of the chosen models in estimating the true tree. We also compare the methods relative to one another by measuring the closeness of the estimated trees corresponding to the different chosen models under these methods. We show that varying the topology does not have a major impact on model choice. We also show that the outcome of the two proposed extensions is identical and is comparable to that of the BIC, Extended-BIC, and DT. Hence, using the simpler methods in choosing a model for analyzing the data is more computationally feasible, with results comparable to the more computationally intensive methods. Another outcome of this study is that earlier conclusions about the DT approach are reinforced. That is, LRT, Extended-AIC, and AIC result in more complicated models that do not contribute to the performance of the phylogenetic inference, yet cause a significant increase in the time required for data analysis.Keywords
This publication has 13 references indexed in Scilit:
- Performance-Based Selection of Likelihood Models for Phylogeny EstimationSystematic Biology, 2003
- Selecting the Best-Fit Model of Nucleotide SubstitutionSystematic Biology, 2001
- MODELTEST: testing the model of DNA substitution.Bioinformatics, 1998
- Phylogeny Estimation and Hypothesis Testing Using Maximum LikelihoodAnnual Review of Ecology and Systematics, 1997
- Seq-Gen: an application for the Monte Carlo simulation of DNA sequence evolution along phylogenetic treesBioinformatics, 1997
- Evolution of the Mitochondrial Cytochrome Oxidase II Gene in CollembolaJournal of Molecular Evolution, 1997
- Success of maximum likelihood phylogeny inference in the four-taxon case.Molecular Biology and Evolution, 1995
- Reconstructing evolutionary trees from DNA and proteinsequences: paralinear distances.Proceedings of the National Academy of Sciences, 1994
- Testing the theory of evolution by comparing phylogenetic trees constructed from five different protein sequencesNature, 1982
- Comparison of phylogenetic treesMathematical Biosciences, 1981