Improving the quality of twilight‐zone alignments

1 January 2000

journal article
research article
Published by Wiley in Protein Science

Vol. 9 (8) , 1487-1496
https://doi.org/10.1110/ps.9.8.1487

Abstract

Several recent publications illustrated advantages of using sequence profiles in recognizing distant homologies between proteins. At the same time, the practical usefulness of distant homology recognition depends not only on the sensitivity of the algorithm, but also on the quality of the alignment between a prediction target and the template from the database of known proteins. Here, we study this question for several supersensitive protein algorithms that were previously compared in their recognition sensitivity (Rychlewski et al., 2000). A database of protein pairs with similar structures, but low sequence similarity is used to rate the alignments obtained with several different methods, which included sequence–sequence, sequence–profile, and profile–profile alignment methods. We show that incorporation of evolutionary information encoded in sequence profiles into alignment calculation methods significantly increases the alignment accuracy, bringing them closer to the alignments obtained from structure comparison.In general, alignment quality is correlated with recognition and alignment score significance. For every alignment method, alignments with statistically significant scores correlate with both correct structural templates and good quality alignments. At the same time, average alignment lengths differ in various methods, making the comparison between them difficult. For instance, the alignments obtained by FFAS, the profile–profile alignment algorithm developed in our group are always longer that the alignments obtained with the PSI‐BLAST algorithms. To address this problem, we develop methods to truncate or extend alignments to cover a specified percentage of protein lengths. In most cases, the elongation of the alignment by profile–profile methods is reasonable, adding fragments of similar structure. The examples of erroneous alignment are examined and it is shown that they can be identified based on the model quality.

Keywords

This publication has 52 references indexed in Scilit:

Comparison of sequence profiles. Strategies for structural predictions using sequence information
Protein Science, 2000
GenTHREADER: an efficient and reliable protein fold recognition method for genomic sequences
Journal of Molecular Biology, 1999
Gleaning non-trivial structural, functional and evolutionary information about proteins by iterative database searches
Journal of Molecular Biology, 1999
Sequence comparisons using multiple sequences detect three times as many remote homologues as pairwise methods
Journal of Molecular Biology, 1998
Gapped BLAST and PSI-BLAST: a new generation of protein database search programs
Nucleic Acids Research, 1997
CATH – a hierarchic classification of protein domain structures
Published by Elsevier ,1997
Structural Diversity in a Family of Homologous Proteins
Journal of Molecular Biology, 1996
Topology fingerprint approach to the inverse protein folding problem
Journal of Molecular Biology, 1992
Suboptimal sequence alignment in molecular biology
Journal of Molecular Biology, 1991
Basic local alignment search tool
Journal of Molecular Biology, 1990