A class frequency mixture model that adjusts for site-specific amino acid frequencies and improves inference of protein phylogeny

Open Access

16 December 2008

journal article
research article
Published by Springer Nature in BMC Ecology and Evolution

Vol. 8 (1) , 331
https://doi.org/10.1186/1471-2148-8-331

Abstract

Widely used substitution models for proteins, such as the Jones-Taylor-Thornton (JTT) or Whelan and Goldman (WAG) models, are based on empirical amino acid interchange matrices estimated from databases of protein alignments that incorporate the average amino acid frequencies of the data set under examination (e.g JTT + F). Variation in the evolutionary process between sites is typically modelled by a rates-across-sites distribution such as the gamma (Γ) distribution. However, sites in proteins also vary in the kinds of amino acid interchanges that are favoured, a feature that is ignored by standard empirical substitution matrices. Here we examine the degree to which the pattern of evolution at sites differs from that expected based on empirical amino acid substitution models and evaluate the impact of these deviations on phylogenetic estimation.

Keywords

This publication has 40 references indexed in Scilit:

Empirical profile mixture models for phylogenetic reconstruction
Bioinformatics, 2008
Frequent and Widespread Parallel Evolution of Protein Sequences
Molecular Biology and Evolution, 2008
An Improved General Amino Acid Replacement Matrix
Molecular Biology and Evolution, 2008
Suppression of long-branch attraction artefacts in the animal phylogeny using a site-heterogeneous model
BMC Ecology and Evolution, 2007
RAxML-VI-HPC: maximum likelihood-based phylogenetic analyses with thousands of taxa and mixed models
Bioinformatics, 2006
Site interdependence attributed to tertiary structure in amino acid sequence evolution
Gene, 2005
An expectation maximization algorithm for training hidden substitution models 1 1Edited by F. Cohen
Journal of Molecular Biology, 2002
The rapid generation of mutation data matrices from protein sequences
Bioinformatics, 1992
Asymptotic Properties of Maximum Likelihood Estimators and Likelihood Ratio Tests under Nonstandard Conditions
Journal of the American Statistical Association, 1987
On Information and Sufficiency
The Annals of Mathematical Statistics, 1951