ResBoost: characterizing and predicting catalytic residues in enzymes
Open Access
- 27 June 2009
- journal article
- Published by Springer Nature in BMC Bioinformatics
- Vol. 10 (1) , 197
- https://doi.org/10.1186/1471-2105-10-197
Abstract
Identifying the catalytic residues in enzymes can aid in understanding the molecular basis of an enzyme's function and has significant implications for designing new drugs, identifying genetic disorders, and engineering proteins with novel functions. Since experimentally determining catalytic sites is expensive, better computational methods for identifying catalytic residues are needed. We propose ResBoost, a new computational method to learn characteristics of catalytic residues. The method effectively selects and combines rules of thumb into a simple, easily interpretable logical expression that can be used for prediction. We formally define the rules of thumb that are often used to narrow the list of candidate residues, including residue evolutionary conservation, 3D clustering, solvent accessibility, and hydrophilicity. ResBoost builds on two methods from machine learning, the AdaBoost algorithm and Alternating Decision Trees, and provides precise control over the inherent trade-off between sensitivity and specificity. We evaluated ResBoost using cross-validation on a dataset of 100 enzymes from the hand-curated Catalytic Site Atlas (CSA). ResBoost achieved 85% sensitivity for a 9.8% false positive rate and 73% sensitivity for a 5.7% false positive rate. ResBoost reduces the number of false positives by up to 56% compared to the use of evolutionary conservation scoring alone. We also illustrate the ability of ResBoost to identify recently validated catalytic residues not listed in the CSA.Keywords
This publication has 45 references indexed in Scilit:
- INTREPID—INformation-theoretic TREe traversal for Protein functional site IDentificationBioinformatics, 2008
- Enhanced performance in prediction of protein active sites with THEMATICS and support vector machinesProtein Science, 2008
- Structural Basis for the Aldolase and Epimerase Activities of Staphylococcus aureus Dihydroneopterin AldolaseJournal of Molecular Biology, 2007
- Evaluation of features for catalytic residue prediction in novel foldsProtein Science, 2007
- Mechanism of Dihydroneopterin Aldolase: Functional Roles of the Conserved Active Site Glutamate and Lysine ResiduesBiochemistry, 2006
- MUSCLE: multiple sequence alignment with high accuracy and high throughputNucleic Acids Research, 2004
- UniProt: the Universal Protein knowledgebaseNucleic Acids Research, 2004
- The Protein Data BankNucleic Acids Research, 2000
- Anatomy of protein pockets and cavities: Measurement of binding site geometry and implications for ligand designProtein Science, 1998
- Gapped BLAST and PSI-BLAST: a new generation of protein database search programsNucleic Acids Research, 1997