Improving the quality of twilight‐zone alignments
- 1 January 2000
- journal article
- research article
- Published by Wiley in Protein Science
- Vol. 9 (8) , 1487-1496
- https://doi.org/10.1110/ps.9.8.1487
Abstract
Several recent publications illustrated advantages of using sequence profiles in recognizing distant homologies between proteins. At the same time, the practical usefulness of distant homology recognition depends not only on the sensitivity of the algorithm, but also on the quality of the alignment between a prediction target and the template from the database of known proteins. Here, we study this question for several supersensitive protein algorithms that were previously compared in their recognition sensitivity (Rychlewski et al., 2000). A database of protein pairs with similar structures, but low sequence similarity is used to rate the alignments obtained with several different methods, which included sequence–sequence, sequence–profile, and profile–profile alignment methods. We show that incorporation of evolutionary information encoded in sequence profiles into alignment calculation methods significantly increases the alignment accuracy, bringing them closer to the alignments obtained from structure comparison.In general, alignment quality is correlated with recognition and alignment score significance. For every alignment method, alignments with statistically significant scores correlate with both correct structural templates and good quality alignments. At the same time, average alignment lengths differ in various methods, making the comparison between them difficult. For instance, the alignments obtained by FFAS, the profile–profile alignment algorithm developed in our group are always longer that the alignments obtained with the PSI‐BLAST algorithms. To address this problem, we develop methods to truncate or extend alignments to cover a specified percentage of protein lengths. In most cases, the elongation of the alignment by profile–profile methods is reasonable, adding fragments of similar structure. The examples of erroneous alignment are examined and it is shown that they can be identified based on the model quality.Keywords
This publication has 52 references indexed in Scilit:
- Comparison of sequence profiles. Strategies for structural predictions using sequence informationProtein Science, 2000
- GenTHREADER: an efficient and reliable protein fold recognition method for genomic sequencesJournal of Molecular Biology, 1999
- Gleaning non-trivial structural, functional and evolutionary information about proteins by iterative database searchesJournal of Molecular Biology, 1999
- Sequence comparisons using multiple sequences detect three times as many remote homologues as pairwise methodsJournal of Molecular Biology, 1998
- Gapped BLAST and PSI-BLAST: a new generation of protein database search programsNucleic Acids Research, 1997
- CATH – a hierarchic classification of protein domain structuresPublished by Elsevier ,1997
- Structural Diversity in a Family of Homologous ProteinsJournal of Molecular Biology, 1996
- Topology fingerprint approach to the inverse protein folding problemJournal of Molecular Biology, 1992
- Suboptimal sequence alignment in molecular biologyJournal of Molecular Biology, 1991
- Basic local alignment search toolJournal of Molecular Biology, 1990