An Empirical Codon Model for Protein Sequence Evolution
Open Access
- 8 March 2007
- journal article
- research article
- Published by Oxford University Press (OUP) in Molecular Biology and Evolution
- Vol. 24 (7) , 1464-1479
- https://doi.org/10.1093/molbev/msm064
Abstract
In the past, 2 kinds of Markov models have been considered to describe protein sequence evolution. Codon-level models have been mechanistic with a small number of parameters designed to take into account features, such as transition–transversion bias, codon frequency bias, and synonymous–nonsynonymous amino acid substitution bias. Amino acid models have been empirical, attempting to summarize the replacement patterns observed in large quantities of data and not explicitly considering the distinct factors that shape protein evolution. We have estimated the first empirical codon model (ECM). Previous codon models assume that protein evolution proceeds only by successive single nucleotide substitutions, but our results indicate that model accuracy is significantly improved by incorporating instantaneous doublet and triplet changes. We also find that the affiliations between codons, the amino acid each encodes and the physicochemical properties of the amino acids are main factors driving the process of codon evolution. Neither multiple nucleotide changes nor the strong influence of the genetic code nor amino acids' physicochemical properties form a part of standard mechanistic models and their views of how codon evolution proceeds. We have implemented the ECM for likelihood-based phylogenetic analysis, and an assessment of its ability to describe protein evolution shows that it consistently outperforms comparable mechanistic codon models. We point out the biological interpretation of our ECM and possible consequences for studies of selection.Keywords
This publication has 19 references indexed in Scilit:
- A second generation human haplotype map of over 3.1 million SNPsNature, 2007
- Detecting Amino Acid Sites Under Positive Selection and Purifying SelectionGenetics, 2005
- Accounting for Uncertainty in the Tree Topology Has Little Effect on the Decision-Theoretic Approach to Model Selection in Phylogeny EstimationMolecular Biology and Evolution, 2004
- Determinants of Adaptive Evolution at the Molecular Level: the Extended Complexity HypothesisMolecular Biology and Evolution, 2004
- Accuracy and Power of Statistical Methods for Detecting Adaptive Evolution in Protein Coding Sequences and for Identifying Positively Selected SitesGenetics, 2004
- Estimating the Frequency of Events That Cause Multiple-Nucleotide ChangesGenetics, 2004
- Positive selection at sites of multiple amino acid replacements since rat–mouse divergenceNature, 2004
- A new criterion and method for amino acid classificationJournal of Theoretical Biology, 2004
- rtREV: An Amino Acid Substitution Matrix for Inference of Retrovirus and Reverse Transcriptase PhylogenyJournal of Molecular Evolution, 2002
- An expectation maximization algorithm for training hidden substitution models 1 1Edited by F. CohenJournal of Molecular Biology, 2002