A Dirichlet process model for detecting positive selection in protein-coding DNA sequences
- 18 April 2006
- journal article
- research article
- Published by Proceedings of the National Academy of Sciences in Proceedings of the National Academy of Sciences
- Vol. 103 (16) , 6263-6268
- https://doi.org/10.1073/pnas.0508279103
Abstract
Most methods for detecting Darwinian natural selection at the molecular level rely on estimating the rates or numbers of nonsynonymous and synonymous changes in an alignment of protein-coding DNA sequences. In some of these methods, the nonsynonymous rate of substitution is allowed to vary across the sequence, permitting the identification of single amino acid positions that are under positive natural selection. However, it is unclear which probability distribution should be used to describe how the nonsynonymous rate of substitution varies across the sequence. One widely used solution is to model variation in the nonsynonymous rate across the sequence as a mixture of several discrete or continuous probability distributions. Unfortunately, there is little population genetics theory to inform us of the appropriate probability distribution for among-site variation in the nonsynonymous rate of substitution. Here, we describe an approach to modeling variation in the nonsynonymous rate of substitution by using a Dirichlet process mixture model. The Dirichlet process allows there to be a countably infinite number of nonsynonymous rate classes and is very flexible in accommodating different potential distributions for the nonsynonymous rate of substitution. We implemented the model in a fully Bayesian approach, with all parameters of the model considered as random variables.Keywords
This publication has 43 references indexed in Scilit:
- Site-to-Site Variation of Synonymous Substitution RatesMolecular Biology and Evolution, 2005
- Not So Different After All: A Comparison of Methods for Detecting Amino Acid Sites Under SelectionMolecular Biology and Evolution, 2005
- A Simple Hierarchical Approach to Modeling Distributions of Substitution RatesMolecular Biology and Evolution, 2004
- A Bayesian Mixture Model for Across-Site Heterogeneities in the Amino-Acid Replacement ProcessMolecular Biology and Evolution, 2004
- Estimating the Distribution of Selection Coefficients from Phylogenetic Data with Applications to Mitochondrial and Viral DNAMolecular Biology and Evolution, 2003
- Episodic adaptive evolution of primate lysozymesNature, 1997
- Pattern of nucleotide substitution at major histocompatibility complex class I loci reveals overdominant selectionNature, 1988
- Evolutionary trees from DNA sequences: A maximum likelihood approachJournal of Molecular Evolution, 1981
- Mixtures of Dirichlet Processes with Applications to Bayesian Nonparametric ProblemsThe Annals of Statistics, 1974
- A Bayesian Analysis of Some Nonparametric ProblemsThe Annals of Statistics, 1973