Identifying sites under positive selection with uncertain parameter estimates
Open Access
- 1 July 2006
- journal article
- Published by Canadian Science Publishing in Genome
- Vol. 49 (7) , 767-776
- https://doi.org/10.1139/g06-038
Abstract
Codon-based substitution models are routinely used to measure selective pressures acting on protein-coding genes. To this effect, the nonsynonymous to synonymous rate ratio (dN/dS = ω) is estimated. The proportion of amino-acid sites potentially under positive selection, as indicated by ω > 1, is inferred by fitting a probability distribution where some sites are permitted to have ω > 1. These sites are then inferred by means of an empirical Bayes or by a Bayes empirical Bayes approach that, respectively, ignores or accounts for sampling errors in maximum-likelihood estimates of the distribution used to infer the proportion of sites with ω > 1. Here, we extend a previous full-Bayes approach to include models with high power and low false-positive rates when inferring sites under positive selection. We propose some heuristics to alleviate the computational burden, and show that (i) full Bayes can be superior to empirical Bayes when analyzing a small data set or small simulated data, (ii) full Bayes has only a small advantage over Bayes empirical Bayes with our small test data, and (iii) Bayesian methods appear relatively insensitive to mild misspecifications of the random process generating adaptive evolution in our simulations, but in practice can prove extremely sensitive to model specification. We suggest that the codon model used to detect amino acids under selection should be carefully selected, for instance using Akaike information criterion (AIC).Key words: codon substitution models, empirical Bayes, Bayes empirical Bayes, full Bayes, ROC curves, AIC.Keywords
This publication has 18 references indexed in Scilit:
- A Bayesian Model Comparison Approach to Inferring Positive SelectionMolecular Biology and Evolution, 2005
- Detecting Amino Acid Sites Under Positive Selection and Purifying SelectionGenetics, 2005
- Not So Different After All: A Comparison of Methods for Detecting Amino Acid Sites Under SelectionMolecular Biology and Evolution, 2005
- A Simple Hierarchical Approach to Modeling Distributions of Substitution RatesMolecular Biology and Evolution, 2004
- Model Selection and Model Averaging in Phylogenetics: Advantages of Akaike Information Criterion and Bayesian Approaches Over Likelihood Ratio TestsSystematic Biology, 2004
- Bayesian Estimation of Positively Selected SitesJournal of Molecular Evolution, 2004
- Effects of Models of Rate Evolution on Estimation of Divergence Dates with Special Reference to the Metazoan 18S Ribosomal RNA PhylogenySystematic Biology, 2002
- Empirical Bayes Confidence Intervals Based on Bootstrap SamplesJournal of the American Statistical Association, 1987
- Bayes Empirical BayesJournal of the American Statistical Association, 1981
- Calibration and Statistical InferenceJournal of the American Statistical Association, 1970