Phylogeny-Aware Gap Placement Prevents Errors in Sequence Alignment and Evolutionary Analysis
Top Cited Papers
- 20 June 2008
- journal article
- other
- Published by American Association for the Advancement of Science (AAAS) in Science
- Vol. 320 (5883) , 1632-1635
- https://doi.org/10.1126/science.1158395
Abstract
Genetic sequence alignment is the basis of many evolutionary and comparative studies, and errors in alignments lead to errors in the interpretation of evolutionary information in genomes. Traditional multiple sequence alignment methods disregard the phylogenetic implications of gap patterns that they create and infer systematically biased alignments with excess deletions and substitutions, too few insertions, and implausible insertion-deletion–event histories. We present a method that prevents these systematic errors by recognizing insertions and deletions as distinct evolutionary events. We show theoretically and practically that this improves the quality of sequence alignments and downstream analyses over a wide range of realistic alignment problems. These results suggest that insertions and sequence turnover are more common than is currently thought and challenge the conventional picture of sequence evolution and mechanisms of functional and structural changes.Keywords
This publication has 19 references indexed in Scilit:
- Alignment Uncertainty and Genomic AnalysisScience, 2008
- Discovery of functional elements in 12 Drosophila genomes using evolutionary signaturesNature, 2007
- Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot projectNature, 2007
- Evolutionary and Biomedical Insights from the Rhesus Macaque GenomeScience, 2007
- Human Immunodeficiency Virus Type 1 V1-V2 Envelope Loop Sequences Expand and Add Glycosylation Sites over the Course of Infection, and These Modifications Affect Antibody Neutralization SensitivityJournal of Virology, 2006
- MAFFT version 5: improvement in accuracy of multiple sequence alignmentNucleic Acids Research, 2005
- Genome sequence of the Brown Norway rat yields insights into mammalian evolutionNature, 2004
- Length Variation of Glycoprotein 120 V2 Region in Relation to Biological Phenotypes and Coreceptor Usage of Primary HIV Type 1 IsolatesAIDS Research and Human Retroviruses, 2001
- T-coffee: a novel method for fast and accurate multiple sequence alignment 1 1Edited by J. ThorntonJournal of Molecular Biology, 2000
- CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choiceNucleic Acids Research, 1994