Prediction of protein structural classes using support vector machines

20 April 2006

journal article
research article
Published by Springer Nature in Amino Acids

Vol. 30 (4) , 469-475
https://doi.org/10.1007/s00726-005-0239-0

Abstract

Summary. The support vector machine, a machine-learning method, is used to predict the four structural classes, i.e. mainly α, mainly β, α–β and fss, from the topology-level of CATH protein structure database. For the binary classification, any two structural classes which do not share any secondary structure such as α and β elements could be classified with as high as 90% accuracy. The accuracy, however, will decrease to less than 70% if the structural classes to be classified contain structure elements in common. Our study also shows that the dimensions of feature space 20² = 400 (for dipeptide) and 20³ = 8 000 (for tripeptide) give nearly the same prediction accuracy. Among these 4 structural classes, multi-class classification gives an overall accuracy of about 52%, indicating that the multi-class classification technique in support of vector machines may still need to be further improved in future investigation.

Keywords

KEYWORDS: SUPPORT VECTOR MACHINES – CATH – MULTI-CLASS – PROTEIN STRUCTURAL CLASS PREDICTION – JACKKNIFING

This publication has 51 references indexed in Scilit:

Ten thousand interactions for the molecular biologist
Nature Biotechnology, 2004
A new method for multiclass support vector machines
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2004
Prediction of β-turns with learning machines
Peptides, 2003
Support Vector Machines for Prediction of Protein Domain Structural Class
Journal of Theoretical Biology, 2003
The Protein Data Bank
Acta Crystallographica Section D-Biological Crystallography, 2002
Support vector machines for predicting the specificity of GalNAc-transferase
Peptides, 2002
Prediction of protein structural classes by support vector machines
Computers & Chemistry, 2002
Support Vector Machines for Prediction of Protein Subcellular Location
Molecular Cell Biology Research Communications, 2000
Knowledge-based analysis of microarray gene expression data by using support vector machines
Proceedings of the National Academy of Sciences, 2000
Principles that Govern the Folding of Protein Chains
Science, 1973