Fold recognition by combining profile-profile alignment and support vector machine
Open Access
- 15 March 2005
- journal article
- research article
- Published by Oxford University Press (OUP) in Bioinformatics
- Vol. 21 (11) , 2667-2673
- https://doi.org/10.1093/bioinformatics/bti384
Abstract
Motivation: Currently, the most accurate fold-recognition method is to perform profile–profile alignments and estimate the statistical significances of those alignments by calculating Z-score or E-value. Although this scheme is reliable in recognizing relatively close homologs related at the family level, it has difficulty in finding the remote homologs that are related at the superfamily or fold level. Results: In this paper, we present an alternative method to estimate the significance of the alignments. The alignment between a query protein and a template of length n in the fold library is transformed into a feature vector of length n + 1, which is then evaluated by support vector machine (SVM). The output from SVM is converted to a posterior probability that a query sequence is related to a template, given SVM output. Results show that a new method shows significantly better performance than PSI-BLAST and profile–profile alignment with Z-score scheme. While PSI-BLAST and Z-score scheme detect 16 and 20% of superfamily-related proteins, respectively, at 90% specificity, a new method detects 46% of these proteins, resulting in more than 2-fold increase in sensitivity. More significantly, at the fold level, a new method can detect 14% of remotely related proteins at 90% specificity, a remarkable result considering the fact that the other methods can detect almost none at the same level of specificity. Contact:kds@kaist.ac.krKeywords
This publication has 32 references indexed in Scilit:
- Remote homolog detection using local sequence-structure correlationsProteins-Structure Function and Bioinformatics, 2004
- The interplay of fold recognition and experimental structure determination in structural genomicsCurrent Opinion in Structural Biology, 2004
- TargetDB: a target registration database for structural genomics projectsBioinformatics, 2004
- The ASTRAL Compendium in 2004Nucleic Acids Research, 2004
- Efficient remote homology detection using local structureBioinformatics, 2003
- A global representation of the protein fold spaceProceedings of the National Academy of Sciences, 2003
- Use of receiver operating characteristic (ROC) analysis to evaluate sequence matchingPublished by Elsevier ,2002
- A study on protein sequence alignment quality.Proteins-Structure Function and Bioinformatics, 2002
- Gapped BLAST and PSI-BLAST: a new generation of protein database search programsNucleic Acids Research, 1997
- A Method to Identify Protein Sequences That Fold into a Known Three-Dimensional StructureScience, 1991