The Effect of Insertions, Deletions, and Alignment Errors on the Branch-Site Test of Positive Selection
Open Access
- 5 May 2010
- journal article
- research article
- Published by Oxford University Press (OUP) in Molecular Biology and Evolution
- Vol. 27 (10) , 2257-2267
- https://doi.org/10.1093/molbev/msq115
Abstract
The detection of positive Darwinian selection affecting protein-coding genes remains a topic of great interest and importance. The “branch-site” test is designed to detect localized episodic bouts of positive selection that affect only a few amino acid residues on particular lineages and has been shown to have reasonable power and low false-positive rates for a wide range of selection schemes. Previous simulations examining the performance of the test, however, were conducted under idealized conditions without insertions, deletions, or alignment errors. As the test is sometimes used to analyze divergent sequences, the impact of indels and alignment errors is a major concern. Here, we used a recently developed indel-simulation program to examine the false-positive rate and power of the branch-site test. We find that insertions and deletions do not cause excessive false positives if the alignment is correct, but alignment errors can lead to unacceptably high false positives. Of the alignment methods evaluated, PRANK consistently outperformed MUSCLE, MAFFT, and ClustalW, mostly because the latter programs tend to place nonhomologous codons (or amino acids) into the same column, producing shorter and less accurate alignments and giving the false impression that many amino acid substitutions have occurred at those sites. Our examination of two previous studies suggests that alignment errors may impact the analysis of mammalian and vertebrate genes by the branch-site test, and it is important to use reliable alignment methods.Keywords
This publication has 34 references indexed in Scilit:
- INDELible: A Flexible Simulator of Biological Sequence EvolutionMolecular Biology and Evolution, 2009
- The role of positive selection in determining the molecular cause of species differences in diseaseBMC Ecology and Evolution, 2008
- Pervasive positive selection on duplicated and nonduplicated vertebrate protein coding genesGenome Research, 2008
- Improved accuracy of multiple ncRNA alignment by incorporating structural information into a MAFFT-based frameworkBMC Bioinformatics, 2008
- Alignment Uncertainty and Genomic AnalysisScience, 2008
- Clustal W and Clustal X version 2.0Bioinformatics, 2007
- Adaptive evolution of genes underlying schizophreniaProceedings Of The Royal Society B-Biological Sciences, 2007
- PAML 4: Phylogenetic Analysis by Maximum LikelihoodMolecular Biology and Evolution, 2007
- MUSCLE: multiple sequence alignment with high accuracy and high throughputNucleic Acids Research, 2004
- Asymptotic Properties of Maximum Likelihood Estimators and Likelihood Ratio Tests under Nonstandard ConditionsJournal of the American Statistical Association, 1987