Automatic assessment of alignment quality
Open Access
- 9 December 2005
- journal article
- research article
- Published by Oxford University Press (OUP) in Nucleic Acids Research
- Vol. 33 (22) , 7120-7128
- https://doi.org/10.1093/nar/gki1020
Abstract
Multiple sequence alignments play a central role in the annotation of novel genomes. Given the biological and computational complexity of this task, the automatic generation of high-quality alignments remains challenging. Since multiple alignments are usually employed at the very start of data analysis pipelines, it is crucial to ensure high alignment quality. We describe a simple, yet elegant, solution to assess the biological accuracy of alignments automatically. Our approach is based on the comparison of several alignments of the same sequences. We introduce two functions to compare alignments: the average overlap score and the multiple overlap score. The former identifies difficult alignment cases by expressing the similarity among several alignments, while the latter estimates the biological correctness of individual alignments. We implemented both functions in the MUMSA program and demonstrate the overall robustness and accuracy of both functions on three large benchmark sets.Keywords
This publication has 36 references indexed in Scilit:
- BAliBASE 3.0: Latest developments of the multiple sequence alignment benchmarkProteins-Structure Function and Bioinformatics, 2005
- The use of structure information to increase alignment accuracy does not aid homologue detection with profile HMMsBioinformatics, 2002
- Quality assessment of multiple alignment programsFEBS Letters, 2002
- MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transformNucleic Acids Research, 2002
- Multiple sequence alignment using partial order graphsBioinformatics, 2002
- Recent progress in multiple sequence alignment: a surveyPharmacogenomics, 2001
- Towards a reliable objective function for multiple sequence alignments 1 1Edited by J. KarnJournal of Molecular Biology, 2001
- AL2CO: calculation of positional conservation in a protein sequence alignmentBioinformatics, 2001
- Multiple alignment of complete sequences (MACS) in the post-genomic eraGene, 2001
- The performance of several multiple-sequence alignment programs in relation to secondary-structure features for an rRNA sequence.Molecular Biology and Evolution, 2000