CodonTest: Modeling Amino Acid Substitution Preferences in Coding Sequences
Open Access
- 19 August 2010
- journal article
- research article
- Published by Public Library of Science (PLoS) in PLoS Computational Biology
- Vol. 6 (8) , e1000885
- https://doi.org/10.1371/journal.pcbi.1000885
Abstract
Codon models of evolution have facilitated the interpretation of selective forces operating on genomes. These models, however, assume a single rate of non-synonymous substitution irrespective of the nature of amino acids being exchanged. Recent developments have shown that models which allow for amino acid pairs to have independent rates of substitution offer improved fit over single rate models. However, these approaches have been limited by the necessity for large alignments in their estimation. An alternative approach is to assume that substitution rates between amino acid pairs can be subdivided into rate classes, dependent on the information content of the alignment. However, given the combinatorially large number of such models, an efficient model search strategy is needed. Here we develop a Genetic Algorithm (GA) method for the estimation of such models. A GA is used to assign amino acid substitution pairs to a series of rate classes, where is estimated from the alignment. Other parameters of the phylogenetic Markov model, including substitution rates, character frequencies and branch lengths are estimated using standard maximum likelihood optimization procedures. We apply the GA to empirical alignments and show improved model fit over existing models of codon evolution. Our results suggest that current models are poor approximations of protein evolution and thus gene and organism specific multi-rate models that incorporate amino acid substitution biases are preferred. We further anticipate that the clustering of amino acid substitution rates into classes will be biologically informative, such that genes with similar functions exhibit similar clustering, and hence this clustering will be useful for the evolutionary fingerprinting of genes. Evolution in protein-coding DNA sequences can be modeled at three levels: nucleotides, amino acids or codons that encode the amino acids. Codon models incorporate nucleotide and amino acid information, and allow the estimation of the rate at which amino acids are replaced ( ) versus the rate at which they are preserved ( ). The ratio has been used in thousands of studies to detect molecular footprints of natural selection. A serious limitation of most codon models is the unrealistic assumption that all non-synonymous substitutions occur at the same rate. Indeed, amino acid models have consistently demonstrated that different residues are exchanged more or less frequently, depending on incompletely understood factors. We derive and validate a computational approach for inferring codon models which combine the power to investigate natural selection with data-driven amino acid substitution biases from alignments. The addition of amino acid properties can lead to more powerful and accurate methods for studying natural selection and the evolutionary history of protein-coding sequences. The pattern of amino acid substitutions specific to a given alignment can be used to compare and contrast the evolutionary properties of different genes, providing an evolutionary analog to protein family comparisons.Keywords
This publication has 54 references indexed in Scilit:
- Evolutionary Fingerprinting of GenesMolecular Biology and Evolution, 2009
- Models of coding sequence evolutionBriefings in Bioinformatics, 2008
- Bayesian analysis of amino acid substitution modelsPhilosophical Transactions Of The Royal Society B-Biological Sciences, 2008
- Elucidation of phenotypic adaptations: Molecular analyses of dim-light vision proteins in vertebratesProceedings of the National Academy of Sciences, 2008
- The Global Circulation of Seasonal Influenza A (H3N2) VirusesScience, 2008
- jModelTest: Phylogenetic Model AveragingMolecular Biology and Evolution, 2008
- An Improved General Amino Acid Replacement MatrixMolecular Biology and Evolution, 2008
- rtREV: An Amino Acid Substitution Matrix for Inference of Retrovirus and Reverse Transcriptase PhylogenyJournal of Molecular Evolution, 2002
- The rapid generation of mutation data matrices from protein sequencesBioinformatics, 1992
- Evolutionary trees from DNA sequences: A maximum likelihood approachJournal of Molecular Evolution, 1981