In search for more accurate alignments in the twilight zone
Open Access
- 1 July 2002
- journal article
- Published by Wiley in Protein Science
- Vol. 11 (7) , 1702-1713
- https://doi.org/10.1110/ps.4820102
Abstract
A major bottleneck in comparative modeling is the alignment quality; this is especially true for proteins whose distant relationships could be reliably recognized only by recent advances in fold recognition. The best algorithms excel in recognizing distant homologs but often produce incorrect alignments for over 50% of protein pairs in large fold‐prediction benchmarks. The alignments obtained by sequence–sequence or sequence–structure matching algorithms differ significantly from the structural alignments. To study this problem, we developed a simplified method to explicitly enumerate all possible alignments for a pair of proteins. This allowed us to estimate the number of significantly different alignments for a given scoring method that score better than the structural alignment. Using several examples of distantly related proteins, we show that for standard sequence–sequence alignment methods, the number of significantly different alignments is usually large, often about 1010 alternatives. This distance decreases when the alignment method is improved, but the number is still too large for the brute force enumeration approach. More effective strategies were needed, so we evaluated and compared two well‐known approaches for searching the space of suboptimal alignments. We combined their best features and produced a hybrid method, which yielded alignments that surpassed the original alignments for about 50% of protein pairs with minimal computational effort.Keywords
This publication has 44 references indexed in Scilit:
- Assessment of the CASP4 fold recognition categoryProteins-Structure Function and Bioinformatics, 2001
- Comparison of sequence profiles. Strategies for structural predictions using sequence informationProtein Science, 2000
- GenTHREADER: an efficient and reliable protein fold recognition method for genomic sequencesJournal of Molecular Biology, 1999
- Multiple Model Approach: Exploring the Limits of Comparative ModelingJournal of Molecular Modeling, 1998
- Gapped BLAST and PSI-BLAST: a new generation of protein database search programsNucleic Acids Research, 1997
- CATH – a hierarchic classification of protein domain structuresPublished by Elsevier ,1997
- Analysis of amino acid indices and mutation matrices for sequence comparison and structure prediction of proteinsProtein Engineering, Design and Selection, 1996
- Topology fingerprint approach to the inverse protein folding problemJournal of Molecular Biology, 1992
- Suboptimal sequence alignment in molecular biologyJournal of Molecular Biology, 1991
- A simple method to generate non-trivial alternate alignments of protein sequencesJournal of Molecular Biology, 1991