A machine learning approach to predicting protein–ligand binding affinity with applications to molecular docking
Top Cited Papers
Open Access
- 17 March 2010
- journal article
- research article
- Published by Oxford University Press (OUP) in Bioinformatics
- Vol. 26 (9) , 1169-1175
- https://doi.org/10.1093/bioinformatics/btq112
Abstract
Motivation: Accurately predicting the binding affinities of large sets of diverse protein–ligand complexes is an extremely challenging task. The scoring functions that attempt such computational prediction are essential for analysing the outputs of molecular docking, which in turn is an important technique for drug discovery, chemical biology and structural biology. Each scoring function assumes a predetermined theory-inspired functional form for the relationship between the variables that characterize the complex, which also include parameters fitted to experimental or simulation data and its predicted binding affinity. The inherent problem of this rigid approach is that it leads to poor predictivity for those complexes that do not conform to the modelling assumptions. Moreover, resampling strategies, such as cross-validation or bootstrapping, are still not systematically used to guard against the overfitting of calibration data in parameter estimation for scoring functions. Results: We propose a novel scoring function (RF-Score) that circumvents the need for problematic modelling assumptions via non-parametric machine learning. In particular, Random Forest was used to implicitly capture binding effects that are hard to model explicitly. RF-Score is compared with the state of the art on the demanding PDBbind benchmark. Results show that RF-Score is a very competitive scoring function. Importantly, RF-Score's performance was shown to improve dramatically with training set size and hence the future availability of more high-quality structural and interaction data is expected to lead to improved versions of RF-Score. Contact:pedro.ballester@ebi.ac.uk; jbom@st-andrews.ac.uk Supplementary information: Supplementary data are available at Bioinformatics online.Keywords
This publication has 50 references indexed in Scilit:
- Chemical Probes that Competitively and Selectively Inhibit Stat3 ActivationPLOS ONE, 2009
- Computational evaluation of protein–small molecule bindingCurrent Opinion in Structural Biology, 2009
- Prediction of glycosylation sites using random forestsBMC Bioinformatics, 2008
- Information Theory-Based Scoring Function for the Structure-Based Prediction of Protein−Ligand Binding AffinityJournal of Chemical Information and Modeling, 2008
- Towards the development of universal, fast and highly accurate docking/scoring methods: a long way to goBritish Journal of Pharmacology, 2008
- Prediction of protein–protein interactions using random decision forest frameworkBioinformatics, 2005
- The Protein Data BankNucleic Acids Research, 2000
- BLEEP?potential of mean force describing protein-ligand interactions: I. Generating potentialJournal of Computational Chemistry, 1999
- Development and validation of a genetic algorithm for flexible docking 1 1Edited by F. E. CohenJournal of Molecular Biology, 1997
- Molecular recognition of receptor sites using a genetic algorithm with a description of desolvationJournal of Molecular Biology, 1995