A Machine Learning-Based Method To Improve Docking Scoring Functions and Its Application to Drug Repurposing
- 3 February 2011
- journal article
- research article
- Published by American Chemical Society (ACS) in Journal of Chemical Information and Modeling
- Vol. 51 (2) , 408-419
- https://doi.org/10.1021/ci100369f
Abstract
Docking scoring functions are notoriously weak predictors of binding affinity. They typically assign a common set of weights to the individual energy terms that contribute to the overall energy score; however, these weights should be gene family dependent. In addition, they incorrectly assume that individual interactions contribute toward the total binding affinity in an additive manner. In reality, noncovalent interactions often depend on one another in a nonlinear manner. In this paper, we show how the use of support vector machines (SVMs), trained by associating sets of individual energy terms retrieved from molecular docking with the known binding affinity of each compound from high-throughput screening experiments, can be used to improve the correlation between known binding affinities and those predicted by the docking program eHiTS. We construct two prediction models: a regression model trained using IC50 values from BindingDB, and a classification model trained using active and decoy compounds from the Directory of Useful Decoys (DUD). Moreover, to address the issue of overrepresentation of negative data in high-throughput screening data sets, we have designed a multiple-planar SVM training procedure for the classification model. The increased performance that both SVMs give when compared with the original eHiTS scoring function highlights the potential for using nonlinear methods when deriving overall energy scores from their individual components. We apply the above methodology to train a new scoring function for direct inhibitors of Mycobacterium tuberculosis (M.tb) InhA. By combining ligand binding site comparison with the new scoring function, we propose that phosphodiesterase inhibitors can potentially be repurposed to target M.tb InhA. Our methodology may be applied to other gene families for which target structures and activity data are available, as demonstrated in the work presented here.Keywords
This publication has 51 references indexed in Scilit:
- NNScore: A Neural-Network-Based Scoring Function for the Characterization of Protein−Ligand ComplexesJournal of Chemical Information and Modeling, 2010
- A machine learning approach to predicting protein–ligand binding affinity with applications to molecular dockingBioinformatics, 2010
- A novel method for mining highly imbalanced high-throughput screening data in PubChemBioinformatics, 2009
- LigMatch: A Multiple Structure-Based Ligand Matching Method for 3D Virtual ScreeningJournal of Chemical Information and Modeling, 2009
- Binding of Small-Molecule Ligands to Proteins: “What You See” Is Not Always “What You Get”Structure, 2009
- Triclosan Derivatives: Towards Potent Inhibitors of Drug‐Sensitive and Drug‐Resistant Mycobacterium tuberculosisChemMedChem, 2009
- Detecting evolutionary relationships across existing fold space, using sequence order-independent profile–profile alignmentsProceedings of the National Academy of Sciences, 2008
- Predicting Absolute Ligand Binding Free Energies to a Simple Model SiteJournal of Molecular Biology, 2007
- Benchmarking Sets for Molecular DockingJournal of Medicinal Chemistry, 2006
- Pyrrolidine Carboxamides as a Novel Class of Inhibitors of Enoyl Acyl Carrier Protein Reductase from Mycobacterium tuberculosisJournal of Medicinal Chemistry, 2006