Ensemble Methods for Classification in Cheminformatics
- 13 October 2004
- journal article
- Published by American Chemical Society (ACS) in Journal of Chemical Information and Computer Sciences
- Vol. 44 (6) , 1971-1978
- https://doi.org/10.1021/ci049850e
Abstract
We describe the application of ensemble methods to binary classification problems on two pharmaceutical compound data sets. Several variants of single and ensembles models of k-nearest neighbors classifiers, support vector machines (SVMs), and single ridge regression models are compared. All methods exhibit robust classification even when more features are given than observations. On two data sets dealing with specific properties of drug-like substances (cytochrome P450 inhibition and "Frequent Hitters", i.e., unspecific protein inhibition), we achieve classification rates above 90%. We are able to reduce the cross-validated misclassification rate for the Frequent Hitters problem by a factor of 2 compared to previous results obtained for the same data set with different modeling techniques.Keywords
This publication has 11 references indexed in Scilit:
- Identification and Prediction of Promiscuous Aggregating Inhibitors among Known DrugsJournal of Medicinal Chemistry, 2003
- Toward Generating Simpler QSAR Models: Nonlinear Multivariate Regression versus Several Neural Network Ensembles and Some Related MethodsJournal of Chemical Information and Computer Sciences, 2003
- Predicting the Genotoxicity of Secondary and Aromatic Amines Using Data Subsetting To Generate a Model EnsembleJournal of Chemical Information and Computer Sciences, 2003
- Active Learning with Support Vector Machines in the Drug Discovery ProcessJournal of Chemical Information and Computer Sciences, 2003
- A fast virtual screening filter for cytochrome P450 3A4 inhibition liability of compound librariesQuantitative Structure-Activity Relationships, 2002
- Development of a Virtual Screening Method for Identification of “Frequent Hitters” in Compound LibrariesJournal of Medicinal Chemistry, 2001
- The Elements of Statistical LearningPublished by Springer Nature ,2001
- Fast nearest-neighbor searching for nonlinear signal processingPhysical Review E, 2000
- Atomic physicochemical parameters for three dimensional structure directed quantitative structure-activity relationships. 4. Additional parameters for hydrophobic and dispersive interactions and their application for an automated superposition of certain naturally occurring nucleoside antibioticsJournal of Chemical Information and Computer Sciences, 1989
- Atomic physicochemical parameters for three-dimensional-structure-directed quantitative structure-activity relationships. 2. Modeling dispersive and hydrophobic interactionsJournal of Chemical Information and Computer Sciences, 1987