Descriptor-based protein remote homology identification
Open Access
- 1 February 2005
- journal article
- Published by Wiley in Protein Science
- Vol. 14 (2) , 431-444
- https://doi.org/10.1110/ps.041035505
Abstract
Here, we report a novel protein sequence descriptor-based remote homology identification method, able to infer fold relationships without the explicit knowledge of structure. In a first phase, we have individually benchmarked 13 different descriptor types in fold identification experiments in a highly diverse set of protein sequences. The relevant descriptors were related to the fold class membership by using simple similarity measures in the descriptor spaces, such as the cosine angle. Our results revealed that the three best-performing sets of descriptors were the sequence-alignment-based descriptor using PSI-BLAST e-values, the descriptors based on the alignment of secondary structural elements (SSEA), and the descriptors based on the occurrence of PROSITE functional motifs. In a second phase, the three top-performing descriptors were combined to obtain a final method with improved performance, which we named DescFold. Class membership was predicted by Support Vector Machine (SVM) learning. In comparison with the individual PSI-BLAST-based descriptor, the rate of remote homology identification increased from 33.7% to 46.3%. We found out that the composite set of descriptors was able to identify the true remote homolog for nearly every sixth sequence at the 95% confidence level, or some 10% more than a single PSI-BLAST search. We have benchmarked the DescFold method against several other state-of-the-art fold recognition algorithms for the 172 LiveBench-8 targets, and we concluded that it was able to add value to the existing techniques by providing a confident hit for at least 10% of the sequences not identifiable by the previously known methods.Keywords
This publication has 68 references indexed in Scilit:
- Assessment of the CASP4 fold recognition categoryProteins-Structure Function and Bioinformatics, 2001
- Characterization of novel proteins based on known protein structuresJournal of Molecular Biology, 2000
- Protein secondary structure prediction based on position-specific scoring matrices 1 1Edited by G. Von HeijneJournal of Molecular Biology, 1999
- Crystal structure of alginate lyase A1-III from Sphingomonas species A1 at 1.78 å resolutionJournal of Molecular Biology, 1999
- Gapped BLAST and PSI-BLAST: a new generation of protein database search programsNucleic Acids Research, 1997
- CATH – a hierarchic classification of protein domain structuresPublished by Elsevier ,1997
- Topology fingerprint approach to the inverse protein folding problemJournal of Molecular Biology, 1992
- A new approach to protein fold recognitionNature, 1992
- One thousand families for the molecular biologistNature, 1992
- Basic local alignment search toolJournal of Molecular Biology, 1990