A word-oriented approach to alignment validation

Open Access

22 February 2005

journal article
research article
Published by Oxford University Press (OUP) in Bioinformatics

Vol. 21 (10) , 2230-2239
https://doi.org/10.1093/bioinformatics/bti335

Abstract

Motivation: Multiple sequence alignment at the level of whole proteomes requires a high degree of automation, precluding the use of traditional validation methods such as manual curation. Since evolutionary models are too general to describe the history of each residue in a protein family, there is no single algorithm/model combination that can yield a biologically or evolutionarily optimal alignment. We propose a ‘shotgun’ strategy where many different algorithms are used to align the same family, and the best of these alignments is then chosen with a reliable objective function. We present WOOF, a novel ‘word-oriented’ objective function that relies on the identification and scoring of conserved amino acid patterns (words) between pairs of sequences.

Keywords

This publication has 32 references indexed in Scilit:

The Draft Genome of Ciona intestinalis : Insights into Chordate and Vertebrate Origins
Science, 2002
A study on protein sequence alignment quality.
Proteins-Structure Function and Bioinformatics, 2002
The PROSITE database, its status in 2002
Nucleic Acids Research, 2002
The complexity of multiple sequence alignment with SP-score that is a metric
Theoretical Computer Science, 2001
Selection of Conserved Blocks from Multiple Alignments for Their Use in Phylogenetic Analysis
Molecular Biology and Evolution, 2000
A symmetric-iterated multiple alignment of protein sequences
Journal of Molecular Biology, 1998
Significant Improvement in Accuracy of Multiple Protein Sequence Alignments by Iterative Refinement as Assessed by Reference to Structural Alignments
Journal of Molecular Biology, 1996
The structural alignment between two proteins: Is there a unique answer?
Protein Science, 1996
Optimum superimposition of protein structures: ambiguities and implications
Folding and Design, 1996
The Multiple Sequence Alignment Problem in Biology
SIAM Journal on Applied Mathematics, 1988