Combining sensitive database searches with multiple intermediates to detect distant homologues.
Open Access
- 1 February 1999
- journal article
- research article
- Published by Oxford University Press (OUP) in Protein Engineering, Design and Selection
- Vol. 12 (2) , 95-100
- https://doi.org/10.1093/protein/12.2.95
Abstract
Using data from the CATH structure classification, we have assessed the blastp, fasta, smith-waterman and gapped-blast algorithms, developed a portable normalization scheme and identified safe thresholds for database searching. Of the four methods assessed, fasta, smith-waterman and gapped-blast perform similarly, whereas the sensitivity of blastp was much lower. Introduction of an intermediate sequence search substantially improved the results. When tested on a set of relationships that could not be identified by blastp, intermediate sequences were able to find double the number of relationships identified by the smith-waterman algorithm alone. However, we found that the benefit of using intermediates varied considerably between each family and depended not only on the number of available sequences, but also their diversity. In an attempt to increase sensitivity further, a multiple intermediate sequence search (MISS) procedure was developed. When assessed on 1906 cases from a wide range of homologous families that could not be detected by the previous approaches, MISS was able to identify 241 additional relationships. MISS uses the full extent of sequence diversity to detect additional relationships, but does not consider any structure-specific information. For this reason, it is more generally applicable than fold recognition and threading methods, which require a library of known structures.Keywords
This publication has 27 references indexed in Scilit:
- An improved algorithm for matching biological sequencesPublished by Elsevier ,2004
- Do aligned sequences share the same fold?Journal of Molecular Biology, 1997
- Gapped BLAST and PSI-BLAST: a new generation of protein database search programsNucleic Acids Research, 1997
- [27] Local alignment statisticsPublished by Elsevier ,1996
- Divergent evolution of a β/α‐barrel subclass: Detection of numerous phosphate‐binding sites by motif searchProtein Science, 1995
- Translational initiation factors IF-1 and eIF-2α share an RNA-binding motif with prokaryotic ribosomal protein S1 and polynucleotide phosphorylaseGene, 1992
- Exhaustive Matching of the Entire Protein Sequence DatabaseScience, 1992
- Basic local alignment search toolJournal of Molecular Biology, 1990
- Aligning amino acid sequences: Comparison of commonly used methodsJournal of Molecular Evolution, 1985
- The protein data bank: A computer-based archival file for macromolecular structuresJournal of Molecular Biology, 1977