The genetic code can cause systematic bias in simple phylogenetic models
- 7 October 2008
- journal article
- research article
- Published by The Royal Society in Philosophical Transactions Of The Royal Society B-Biological Sciences
- Vol. 363 (1512) , 4003-4011
- https://doi.org/10.1098/rstb.2008.0171
Abstract
Phylogenetic analysis depends on inferential methodology estimating accurately the degree of divergence between sequences. Inaccurate estimates can lead to misleading evolutionary inferences, including incorrect tree topology estimates and poor dating of historical species divergence. Protein coding sequences are ubiquitous in phylogenetic inference, but many of the standard methods commonly used to describe their evolution do not explicitly account for the dependencies between sites in a codon induced by the genetic code. This study evaluates the performance of several standard methods on datasets simulated under a simple substitution model, describing codon evolution under a range of different types of selective pressures. This approach also offers insights into the relative performance of different phylogenetic methods when there are dependencies acting between the sites in the data. Methods based on statistical models performed well when there was no or limited purifying selection in the simulated sequences (low degree of dependency between sites in a codon), although more biologically realistic models tended to outperform simpler models. Phylogenetic methods exhibited greater variability in performance for sequences simulated under strong purifying selection (high degree of the dependencies between sites in a codon). Simple models substantially underestimate the degree of divergence between sequences, and underestimation was more pronounced on the internal branches of the tree. This underestimation resulted in some statistical methods performing poorly and exhibiting evidence for systematic bias in tree inference. Amino acid-based and nucleotide models that contained generic descriptions of spatial and temporal heterogeneity, such as mixture and temporal hidden Markov models, coped notably better, producing more accurate estimates of evolutionary divergence and the tree topology.Keywords
This publication has 33 references indexed in Scilit:
- Spatial and Temporal Heterogeneity in Nucleotide Sequence EvolutionMolecular Biology and Evolution, 2008
- PAML 4: Phylogenetic Analysis by Maximum LikelihoodMolecular Biology and Evolution, 2007
- Bushes in the Tree of LifePLoS Biology, 2006
- Animal Evolution and the Molecular Signature of Radiations Compressed in TimeScience, 2005
- Performance of maximum parsimony and likelihood phylogenetics when evolution is heterogeneousNature, 2004
- Inferring Nonneutral Evolution from Human-Chimp-Mouse Orthologous Gene TriosScience, 2003
- AIDS as a Zoonosis: Scientific and Public Health ImplicationsScience, 2000
- A Stochastic Model for the Evolution of Autocorrelated DNA SequencesMolecular Phylogenetics and Evolution, 1994
- Cases in which Parsimony or Compatibility Methods Will be Positively MisleadingSystematic Zoology, 1978
- Toward Defining the Course of Evolution: Minimum Change for a Specific Tree TopologySystematic Zoology, 1971