Not So Different After All: A Comparison of Methods for Detecting Amino Acid Sites Under Selection

Top Cited Papers

Open Access

9 February 2005

journal article
research article
Published by Oxford University Press (OUP) in Molecular Biology and Evolution

Vol. 22 (5) , 1208-1222
https://doi.org/10.1093/molbev/msi105

Abstract

We consider three approaches for estimating the rates of nonsynonymous and synonymous changes at each site in a sequence alignment in order to identify sites under positive or negative selection: (1) a suite of fast likelihood-based “counting methods” that employ either a single most likely ancestral reconstruction, weighting across all possible ancestral reconstructions, or sampling from ancestral reconstructions; (2) a random effects likelihood (REL) approach, which models variation in nonsynonymous and synonymous rates across sites according to a predefined distribution, with the selection pressure at an individual site inferred using an empirical Bayes approach; and (3) a fixed effects likelihood (FEL) method that directly estimates nonsynonymous and synonymous substitution rates at each site. All three methods incorporate flexible models of nucleotide substitution bias and variation in both nonsynonymous and synonymous substitution rates across sites, facilitating the comparison between the methods. We demonstrate that the results obtained using these approaches show broad agreement in levels of Type I and Type II error and in estimates of substitution rates. Counting methods are well suited for large alignments, for which there is high power to detect positive and negative selection, but appear to underestimate the substitution rate. A REL approach, which is more computationally intensive than counting methods, has higher power than counting methods to detect selection in data sets of intermediate size but may suffer from higher rates of false positives for small data sets. A FEL approach appears to capture the pattern of rate variation better than counting methods or random effects models, does not suffer from as many false positives as random effects models for data sets comprising few sequences, and can be efficiently parallelized. Our results suggest that previously reported differences between results obtained by counting methods and random effects models arise due to a combination of the conservative nature of counting-based methods, the failure of current random effects models to allow for variation in synonymous substitution rates, and the naive application of random effects models to extremely sparse data sets. We demonstrate our methods on sequence data from the human immunodeficiency virus type 1 env and pol genes and simulated alignments.

Keywords

This publication has 45 references indexed in Scilit:

Datamonkey: rapid detection of selective pressure on individual sites of codon alignments
Bioinformatics, 2005
HyPhy: hypothesis testing using phylogenies
Bioinformatics, 2004
A Simple Hierarchical Approach to Modeling Distributions of Substitution Rates
Molecular Biology and Evolution, 2004
Bayesian Estimation of Positively Selected Sites
Journal of Molecular Evolution, 2004
Bayesian Phylogenetic Model Selection Using Reversible Jump Markov Chain Monte Carlo
Molecular Biology and Evolution, 2004
Accuracy and Power of Bayes Prediction of Amino Acid Sites Under Positive Selection
Molecular Biology and Evolution, 2002
Accuracy and Power of the Likelihood Ratio Test in Detecting Adaptive Molecular Evolution
Molecular Biology and Evolution, 2001
Hitchhiking Under Positive Darwinian Selection
Genetics, 2000
In Vivo Prevalence of Azidothymidine (AZT) Resistance Mutations in an AIDS Patient Before and After AZT Therapy
AIDS Research and Human Retroviruses, 1991
Evolutionary trees from DNA sequences: A maximum likelihood approach
Journal of Molecular Evolution, 1981