Not So Different After All: A Comparison of Methods for Detecting Amino Acid Sites Under Selection
Top Cited Papers
Open Access
- 9 February 2005
- journal article
- research article
- Published by Oxford University Press (OUP) in Molecular Biology and Evolution
- Vol. 22 (5) , 1208-1222
- https://doi.org/10.1093/molbev/msi105
Abstract
We consider three approaches for estimating the rates of nonsynonymous and synonymous changes at each site in a sequence alignment in order to identify sites under positive or negative selection: (1) a suite of fast likelihood-based “counting methods” that employ either a single most likely ancestral reconstruction, weighting across all possible ancestral reconstructions, or sampling from ancestral reconstructions; (2) a random effects likelihood (REL) approach, which models variation in nonsynonymous and synonymous rates across sites according to a predefined distribution, with the selection pressure at an individual site inferred using an empirical Bayes approach; and (3) a fixed effects likelihood (FEL) method that directly estimates nonsynonymous and synonymous substitution rates at each site. All three methods incorporate flexible models of nucleotide substitution bias and variation in both nonsynonymous and synonymous substitution rates across sites, facilitating the comparison between the methods. We demonstrate that the results obtained using these approaches show broad agreement in levels of Type I and Type II error and in estimates of substitution rates. Counting methods are well suited for large alignments, for which there is high power to detect positive and negative selection, but appear to underestimate the substitution rate. A REL approach, which is more computationally intensive than counting methods, has higher power than counting methods to detect selection in data sets of intermediate size but may suffer from higher rates of false positives for small data sets. A FEL approach appears to capture the pattern of rate variation better than counting methods or random effects models, does not suffer from as many false positives as random effects models for data sets comprising few sequences, and can be efficiently parallelized. Our results suggest that previously reported differences between results obtained by counting methods and random effects models arise due to a combination of the conservative nature of counting-based methods, the failure of current random effects models to allow for variation in synonymous substitution rates, and the naive application of random effects models to extremely sparse data sets. We demonstrate our methods on sequence data from the human immunodeficiency virus type 1 env and pol genes and simulated alignments.Keywords
This publication has 45 references indexed in Scilit:
- Datamonkey: rapid detection of selective pressure on individual sites of codon alignmentsBioinformatics, 2005
- HyPhy: hypothesis testing using phylogeniesBioinformatics, 2004
- A Simple Hierarchical Approach to Modeling Distributions of Substitution RatesMolecular Biology and Evolution, 2004
- Bayesian Estimation of Positively Selected SitesJournal of Molecular Evolution, 2004
- Bayesian Phylogenetic Model Selection Using Reversible Jump Markov Chain Monte CarloMolecular Biology and Evolution, 2004
- Accuracy and Power of Bayes Prediction of Amino Acid Sites Under Positive SelectionMolecular Biology and Evolution, 2002
- Accuracy and Power of the Likelihood Ratio Test in Detecting Adaptive Molecular EvolutionMolecular Biology and Evolution, 2001
- Hitchhiking Under Positive Darwinian SelectionGenetics, 2000
- In Vivo Prevalence of Azidothymidine (AZT) Resistance Mutations in an AIDS Patient Before and After AZT TherapyAIDS Research and Human Retroviruses, 1991
- Evolutionary trees from DNA sequences: A maximum likelihood approachJournal of Molecular Evolution, 1981