Enzyme family classification by support vector machines
- 20 February 2004
- journal article
- research article
- Published by Wiley in Proteins-Structure Function and Bioinformatics
- Vol. 55 (1) , 66-76
- https://doi.org/10.1002/prot.20045
Abstract
One approach for facilitating protein function prediction is to classify proteins into functional families. Recent studies on the classification of G‐protein coupled receptors and other proteins suggest that a statistical learning method, Support vector machines (SVM), may be potentially useful for protein classification into functional families. In this work, SVM is applied and tested on the classification of enzymes into functional families defined by the Enzyme Nomenclature Committee of IUBMB. SVM classification system for each family is trained from representative enzymes of that family and seed proteins of Pfam curated protein families. The classification accuracy for enzymes from 46 families and for non‐enzymes is in the range of 50.0% to 95.7% and 79.0% to 100% respectively. The corresponding Matthews correlation coefficient is in the range of 54.1% to 96.1%. Moreover, 80.3% of the 8,291 correctly classified enzymes are uniquely classified into a specific enzyme family by using a scoring function, indicating that SVM may have certain level of unique prediction capability. Testing results also suggest that SVM in some cases is capable of classification of distantly related enzymes and homologous enzymes of different functions. Effort is being made to use a more comprehensive set of enzymes as training sets and to incorporate multi‐class SVM classification systems to further enhance the unique prediction accuracy. Our results suggest the potential of SVM for enzyme family classification and for facilitating protein function prediction. Our software is accessible at http://jing.cz3.nus.edu.sg/cgi‐bin/svmprot.cgi. Proteins 2004.Keywords
This publication has 51 references indexed in Scilit:
- The SWISS-PROT protein knowledgebase and its supplement TrEMBL in 2003Nucleic Acids Research, 2003
- Prediction of Human Protein Function from Post-translational Modifications and Localization FeaturesJournal of Molecular Biology, 2002
- An efficient algorithm for large-scale detection of protein familiesNucleic Acids Research, 2002
- Support vector machines for predicting HIV protease cleavage sites in proteinJournal of Computational Chemistry, 2001
- A novel method of protein secondary structure prediction with high segment overlap measure: support vector machine approach1 1Edited by B. HollandJournal of Molecular Biology, 2001
- Distantly Related Cousins of MAP Kinase: Biochemical Properties and Possible Physiological FunctionsBiochemical and Biophysical Research Communications, 1999
- Predicting function: from genes to genomes and backJournal of Molecular Biology, 1998
- Support-vector networksMachine Learning, 1995
- Prediction of Protein Secondary Structure at Better than 70% AccuracyJournal of Molecular Biology, 1993
- Recognition of distantly related protein sequences using conserved motifs and neural networksJournal of Molecular Biology, 1992