Multiple sequence alignment: In pursuit of homologous DNA positions
Open Access
- 1 February 2007
- journal article
- review article
- Published by Cold Spring Harbor Laboratory in Genome Research
- Vol. 17 (2) , 127-135
- https://doi.org/10.1101/gr.5232407
Abstract
DNA sequence alignment is a prerequisite to virtually all comparative genomic analyses, including the identification of conserved sequence motifs, estimation of evolutionary divergence between sequences, and inference of historical relationships among genes and species. While it is mere common sense that inaccuracies in multiple sequence alignments can have detrimental effects on downstream analyses, it is important to know the extent to which the inferences drawn from these alignments are robust to errors and biases inherent in all sequence alignments. A survey of investigations into strengths and weaknesses of sequence alignments reveals, as expected, that alignment quality is generally poor for two distantly related sequences and can often be improved by adding additional sequences as stepping stones between distantly related species. Errors in sequence alignment are also found to have a significant negative effect on subsequent inference of sequence divergence, phylogenetic trees, and conserved motifs. However, our understanding of alignment biases remains rudimentary, and sequence alignment procedures continue to be used somewhat like benign formatting operations to make sequences equal in length. Because of the central role these alignments now play in our endeavors to establish the tree of life and to identify important parts of genomes through evolutionary functional genomics, we see a need for increased community effort to investigate influences of alignment bias on the accuracy of large-scale comparative genomics.Keywords
This publication has 101 references indexed in Scilit:
- PhyloGibbs: A Gibbs Sampling Motif Finder That Incorporates PhylogenyPLoS Computational Biology, 2005
- Evolutionarily conserved elements in vertebrate, insect, worm, and yeast genomesGenome Research, 2005
- Assessing computational tools for the discovery of transcription factor binding sitesNature Biotechnology, 2005
- Performance of maximum parsimony and likelihood phylogenetics when evolution is heterogeneousNature, 2004
- Integrating high-throughput and computational data elucidates bacterial networksNature, 2004
- The origin and evolution of model organismsNature Reviews Genetics, 2002
- T-coffee: a novel method for fast and accurate multiple sequence alignment 1 1Edited by J. ThorntonJournal of Molecular Biology, 2000
- CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choiceNucleic Acids Research, 1994
- Sequence alignment and penalty choiceJournal of Molecular Biology, 1994
- Some biological sequence metricsAdvances in Mathematics, 1976