Active site prediction using evolutionary and structural information
Open Access
- 14 January 2010
- journal article
- research article
- Published by Oxford University Press (OUP) in Bioinformatics
- Vol. 26 (5) , 617-624
- https://doi.org/10.1093/bioinformatics/btq008
Abstract
Motivation: The identification of catalytic residues is a key step in understanding the function of enzymes. While a variety of computational methods have been developed for this task, accuracies have remained fairly low. The best existing method exploits information from sequence and structure to achieve a precision (the fraction of predicted catalytic residues that are catalytic) of 18.5% at a corresponding recall (the fraction of catalytic residues identified) of 57% on a standard benchmark. Here we present a new method, Discern, which provides a significant improvement over the state-of-the-art through the use of statistical techniques to derive a model with a small set of features that are jointly predictive of enzyme active sites. Results: In cross-validation experiments on two benchmark datasets from the Catalytic Site Atlas and CATRES resources containing a total of 437 manually curated enzymes spanning 487 SCOP families, Discern increases catalytic site recall between 12% and 20% over methods that combine information from both sequence and structure, and by ≥50% over methods that make use of sequence conservation signal only. Controlled experiments show that Discern's improvement in catalytic residue prediction is derived from the combination of three ingredients: the use of the INTREPID phylogenomic method to extract conservation information; the use of 3D structure data, including features computed for residues that are proximal in the structure; and a statistical regularization procedure to prevent overfitting. Contact:kimmen@berkeley.edu Supplementary information: Supplementary data are available at Bioinformatics online.Keywords
This publication has 62 references indexed in Scilit:
- ResBoost: characterizing and predicting catalytic residues in enzymesBMC Bioinformatics, 2009
- INTREPID—INformation-theoretic TREe traversal for Protein functional site IDentificationBioinformatics, 2008
- Simultaneous Analysis of All SNPs in Genome-Wide and Re-Sequencing Association StudiesPLoS Genetics, 2008
- Enhanced performance in prediction of protein active sites with THEMATICS and support vector machinesProtein Science, 2008
- Evaluation of features for catalytic residue prediction in novel foldsProtein Science, 2007
- Enzyme/Non-enzyme Discrimination and Prediction of Enzyme Active Site Location Using Charge-based MethodsJournal of Molecular Biology, 2004
- MUSCLE: multiple sequence alignment with high accuracy and high throughputNucleic Acids Research, 2004
- UniProt: the Universal Protein knowledgebaseNucleic Acids Research, 2004
- Gapped BLAST and PSI-BLAST: a new generation of protein database search programsNucleic Acids Research, 1997
- Dictionary of protein secondary structure: Pattern recognition of hydrogen‐bonded and geometrical featuresBiopolymers, 1983