Enhanced performance in prediction of protein active sites with THEMATICS and support vector machines
- 1 February 2008
- journal article
- Published by Wiley in Protein Science
- Vol. 17 (2) , 333-341
- https://doi.org/10.1110/ps.073213608
Abstract
Theoretical microscopic titration curves (THEMATICS) is a computational method for the identification of active sites in proteins through deviations in computed titration behavior of ionizable residues. While the sensitivity to catalytic sites is high, the previously reported sensitivity to catalytic residues was not as high, about 50%. Here THEMATICS is combined with support vector machines (SVM) to improve sensitivity for catalytic residue prediction from protein 3D structure alone. For a test set of 64 proteins taken from the Catalytic Site Atlas (CSA), the average recall rate for annotated catalytic residues is 61%; good precision is maintained selecting only 4% of all residues. The average false positive rate, using the CSA annotations is only 3.2%, far lower than other 3D-structure-based methods. THEMATICS-SVM returns higher precision, lower false positive rate, and better overall performance, compared with other 3D-structure-based methods. Comparison is also made with the latest machine learning methods that are based on both sequence alignments and 3D structures. For annotated sets of well-characterized enzymes, THEMATICS-SVM performance compares very favorably with methods that utilize sequence homology. However, since THEMATICS depends only on the 3D structure of the query protein, no decline in performance is expected when applied to novel folds, proteins with few sequence homologues, or even orphan sequences. An extension of the method to predict non-ionizable catalytic residues is also presented. THEMATICS-SVM predicts a local network of ionizable residues with strong interactions between protonation events; this appears to be a special feature of enzyme active sites.Keywords
This publication has 36 references indexed in Scilit:
- Structure-based activity prediction for an enzyme of unknown functionNature, 2007
- Selective prediction of interaction sites in protein structures with THEMATICSBMC Bioinformatics, 2007
- Evaluation of features for catalytic residue prediction in novel foldsProtein Science, 2007
- Looking at Enzymes from the Inside out: The Proximity of Catalytic Residues to the Molecular Centroid can be used for Detection of Active Sites and Enzyme–Ligand InterfacesJournal of Molecular Biology, 2005
- Network Analysis of Protein Structures Identifies Functional ResiduesJournal of Molecular Biology, 2004
- Enzyme/Non-enzyme Discrimination and Prediction of Enzyme Active Site Location Using Charge-based MethodsJournal of Molecular Biology, 2004
- Prediction of functionally important residues based solely on the computed energetics of protein structure 1 1Edited by B. HonigJournal of Molecular Biology, 2001
- Four-body potentials reveal protein-specific correlations to stability changes caused by hydrophobic core mutationsJournal of Molecular Biology, 2001
- Three-dimensional cluster analysis identifies interfaces and functional residue clusters in proteins11Edited by J. ThorntonJournal of Molecular Biology, 2001
- Comparison of simple potential functions for simulating liquid waterThe Journal of Chemical Physics, 1983