A Bayesian Mixture Model for Across-Site Heterogeneities in the Amino-Acid Replacement Process
Top Cited Papers
- 1 June 2004
- journal article
- research article
- Published by Oxford University Press (OUP) in Molecular Biology and Evolution
- Vol. 21 (6) , 1095-1109
- https://doi.org/10.1093/molbev/msh112
Abstract
Most current models of sequence evolution assume that all sites of a protein evolve under the same substitution process, characterized by a 20 x 20 substitution matrix. Here, we propose to relax this assumption by developing a Bayesian mixture model that allows the amino-acid replacement pattern at different sites of a protein alignment to be described by distinct substitution processes. Our model, named CAT, assumes the existence of distinct processes (or classes) differing by their equilibrium frequencies over the 20 residues. Through the use of a Dirichlet process prior, the total number of classes and their respective amino-acid profiles, as well as the affiliations of each site to a given class, are all free variables of the model. In this way, the CAT model is able to adapt to the complexity actually present in the data, and it yields an estimate of the substitutional heterogeneity through the posterior mean number of classes. We show that a significant level of heterogeneity is present in the substitution patterns of proteins, and that the standard one-matrix model fails to account for this heterogeneity. By evaluating the Bayes factor, we demonstrate that the standard model is outperformed by CAT on all of the data sets which we analyzed. Altogether, these results suggest that the complexity of the pattern of substitution of real sequences is better captured by the CAT model, offering the possibility of studying its impact on phylogenetic reconstruction and its connections with structure-function determinants.Keywords
This publication has 50 references indexed in Scilit:
- Bayesian Hierarchical Model for Identifying Changes in Gene Expression from Microarray ExperimentsJournal of Computational Biology, 2002
- Markov Chain Sampling Methods for Dirichlet Process Mixture ModelsJournal of Computational and Graphical Statistics, 2000
- Using Evolutionary Trees in Protein Secondary Structure Prediction and Other Comparative Sequence AnalysesJournal of Molecular Biology, 1996
- Among-site rate variation and its impact on phylogenetic analysesTrends in Ecology & Evolution, 1996
- Bayes FactorsJournal of the American Statistical Association, 1995
- Bayesian Density Estimation and Inference Using MixturesJournal of the American Statistical Association, 1995
- The rapid generation of mutation data matrices from protein sequencesBioinformatics, 1992
- Sequence logos: a new way to display consensus sequencesNucleic Acids Research, 1990
- Evolutionary trees from DNA sequences: A maximum likelihood approachJournal of Molecular Evolution, 1981
- Estimating the Dimension of a ModelThe Annals of Statistics, 1978