Mutation-selection models of coding sequence evolution with site-heterogeneous amino acid fitness profiles
- 9 March 2010
- journal article
- Published by Proceedings of the National Academy of Sciences in Proceedings of the National Academy of Sciences
- Vol. 107 (10) , 4629-4634
- https://doi.org/10.1073/pnas.0910915107
Abstract
Modeling the interplay between mutation and selection at the molecular level is key to evolutionary studies. To this end, codon-based evolutionary models have been proposed as pertinent means of studying long-range evolutionary patterns and are widely used. However, these approaches have not yet consolidated results from amino acid level phylogenetic studies showing that selection acting on proteins displays strong site-specific effects, which translate into heterogeneous amino acid propensities across the columns of alignments; related codon-level studies have instead focused on either modeling a single selective context for all codon columns, or a separate selective context for each codon column, with the former strategy deemed too simplistic and the latter deemed overparameterized. Here, we integrate recent developments in nonparametric statistical approaches to propose a probabilistic model that accounts for the heterogeneity of amino acid fitness profiles across the coding positions of a gene. We apply the model to a dozen real protein-coding gene alignments and find it to produce biologically plausible inferences, for instance, as pertaining to site-specific amino acid constraints, as well as distributions of scaled selection coefficients. In their account of mutational features as well as the heterogeneous regimes of selection at the amino acid level, the modeling approaches studied here can form a backdrop for several extensions, accounting for other selective features, for variable population size, or for subtleties of mutational features, all with parameterizations couched within population-genetic theory.Keywords
This publication has 48 references indexed in Scilit:
- Rapid Likelihood Analysis on Large Phylogenies Using Partial Sampling of Substitution HistoriesMolecular Biology and Evolution, 2009
- PhyloBayes 3: a Bayesian software package for phylogenetic reconstruction and molecular datingBioinformatics, 2009
- Learning to Count: Robust Estimates for Labeled Distances between Molecular SequencesMolecular Biology and Evolution, 2009
- Models of coding sequence evolutionBriefings in Bioinformatics, 2008
- Bayesian analysis of amino acid substitution modelsPhilosophical Transactions Of The Royal Society B-Biological Sciences, 2008
- Basing population genetic inferences and models of molecular evolution upon desired stationary distributions of DNA or protein sequencesPhilosophical Transactions Of The Royal Society B-Biological Sciences, 2008
- Evaluating the robustness of phylogenetic methods to among-site variability in substitution processesPhilosophical Transactions Of The Royal Society B-Biological Sciences, 2008
- Elucidation of phenotypic adaptations: Molecular analyses of dim-light vision proteins in vertebratesProceedings of the National Academy of Sciences, 2008
- Suppression of long-branch attraction artefacts in the animal phylogeny using a site-heterogeneous modelBMC Ecology and Evolution, 2007
- A Dirichlet process model for detecting positive selection in protein-coding DNA sequencesProceedings of the National Academy of Sciences, 2006