A class frequency mixture model that adjusts for site-specific amino acid frequencies and improves inference of protein phylogeny
Open Access
- 16 December 2008
- journal article
- research article
- Published by Springer Nature in BMC Ecology and Evolution
- Vol. 8 (1) , 331
- https://doi.org/10.1186/1471-2148-8-331
Abstract
Widely used substitution models for proteins, such as the Jones-Taylor-Thornton (JTT) or Whelan and Goldman (WAG) models, are based on empirical amino acid interchange matrices estimated from databases of protein alignments that incorporate the average amino acid frequencies of the data set under examination (e.g JTT + F). Variation in the evolutionary process between sites is typically modelled by a rates-across-sites distribution such as the gamma (Γ) distribution. However, sites in proteins also vary in the kinds of amino acid interchanges that are favoured, a feature that is ignored by standard empirical substitution matrices. Here we examine the degree to which the pattern of evolution at sites differs from that expected based on empirical amino acid substitution models and evaluate the impact of these deviations on phylogenetic estimation.Keywords
This publication has 40 references indexed in Scilit:
- Empirical profile mixture models for phylogenetic reconstructionBioinformatics, 2008
- Frequent and Widespread Parallel Evolution of Protein SequencesMolecular Biology and Evolution, 2008
- An Improved General Amino Acid Replacement MatrixMolecular Biology and Evolution, 2008
- Suppression of long-branch attraction artefacts in the animal phylogeny using a site-heterogeneous modelBMC Ecology and Evolution, 2007
- RAxML-VI-HPC: maximum likelihood-based phylogenetic analyses with thousands of taxa and mixed modelsBioinformatics, 2006
- Site interdependence attributed to tertiary structure in amino acid sequence evolutionGene, 2005
- An expectation maximization algorithm for training hidden substitution models 1 1Edited by F. CohenJournal of Molecular Biology, 2002
- The rapid generation of mutation data matrices from protein sequencesBioinformatics, 1992
- Asymptotic Properties of Maximum Likelihood Estimators and Likelihood Ratio Tests under Nonstandard ConditionsJournal of the American Statistical Association, 1987
- On Information and SufficiencyThe Annals of Mathematical Statistics, 1951