Modeling Compositional Heterogeneity
Top Cited Papers
Open Access
- 1 June 2004
- journal article
- Published by Oxford University Press (OUP) in Systematic Biology
- Vol. 53 (3) , 485-495
- https://doi.org/10.1080/10635150490445779
Abstract
Compositional heterogeneity among lineages can compromise phylogenetic analyses, because models in common use assume compositionally homogeneous data. Models that can accommodate compositional heterogeneity with few extra parameters are described here, and used in two examples where the true tree is known with confidence. It is shown using likelihood ratio tests that adequate modeling of compositional heterogeneity can be achieved with few composition parameters, that the data may not need to be modelled with separate composition parameters for each branch in the tree. Tree searching and placement of composition vectors on the tree are done in a Bayesian framework using Markov chain Monte Carlo (MCMC) methods. Assessment of fit of the model to the data is made in both maximum likelihood (ML) and Bayesian frameworks. In an ML framework, overall model fit is assessed using the Goldman-Cox test, and the fit of the composition implied by a (possibly heterogeneous) model to the composition of the data is assessed using a novel tree- and model-based composition fit test. In a Bayesian framework, overall model fit and composition fit are assessed using posterior predictive simulation. It is shown that when composition is not accommodated, then the model does not fit, and incorrect trees are found; but when composition is accommodated, the model then fits, and the known correct phylogenies are obtained.Keywords
This publication has 34 references indexed in Scilit:
- Heterogeneity of Nucleotide Frequencies Among Evolutionary Lineages and Phylogenetic InferenceMolecular Biology and Evolution, 2003
- Bayesian Model Adequacy and Choice in PhylogeneticsMolecular Biology and Evolution, 2002
- Bayesian Inference of Phylogeny and Its Impact on Evolutionary BiologyScience, 2001
- Tree Rooting with Outgroups When They Differ in Their Nucleotide Composition from the Ingroup: The Drosophila saltans and willistoni Groups, a Case StudyMolecular Phylogenetics and Evolution, 2000
- Markov Chasin Monte Carlo Algorithms for the Bayesian Analysis of Phylogenetic TreesMolecular Biology and Evolution, 1999
- Reversible jump Markov chain Monte Carlo computation and Bayesian model determinationBiometrika, 1995
- Recovering a tree from the leaf colourations it generates under a Markov modelApplied Mathematics Letters, 1994
- Reduced Thermophilic Bias in the 16S rDNA Sequence from Thermus ruber Provides Further Support for a Relationship Between Thermus and DeinococcusSystematic and Applied Microbiology, 1993
- Evolutionary trees from DNA sequences: A maximum likelihood approachJournal of Molecular Evolution, 1981
- A new look at the statistical model identificationIEEE Transactions on Automatic Control, 1974