SVM-Based Feature Selection for Characterization of Focused Compound Collections
- 3 March 2004
- journal article
- Published by American Chemical Society (ACS) in Journal of Chemical Information and Computer Sciences
- Vol. 44 (3) , 993-999
- https://doi.org/10.1021/ci0342876
Abstract
Artificial neural networks, the support vector machine (SVM), and other machine learning methods for the classification of molecules are often considered as a “black box”, since the molecular features that are most relevant for a given classifier are usually not presented in a human-interpretable form. We report on an SVM-based algorithm for the selection of relevant molecular features from a trained classifier that might be important for an understanding of ligand−receptor interactions. The original SVM approach was extended to allow for feature selection. The method was applied to characterize focused libraries of enzyme inhibitors. A comparison with classical Kolmogorov-Smirnov (KS)-based feature selection was performed. In most of the applications the SVM method showed sustained classification accuracy, thereby relying on a smaller number of molecular features than KS-based classifiers. In one case both methods produced comparable results. Limiting the calculation of descriptors to only the most relevant ones for a certain biological activity can also be used to speed up high-throughput virtual screening.Keywords
This publication has 19 references indexed in Scilit:
- Collection of Bioactive Reference Compounds for Focused Library DesignQSAR & Combinatorial Science, 2003
- Comparison of Support Vector Machine and Artificial Neural Network Systems for Drug/Nondrug ClassificationJournal of Chemical Information and Computer Sciences, 2003
- Comparison of the predicted and observed secondary structure of T4 phage lysozymePublished by Elsevier ,2003
- “Scaffold-Hopping” by Topological Pharmacophore Search: A Contribution to Virtual ScreeningAngewandte Chemie International Edition in English, 1999
- Wrappers for feature subset selectionArtificial Intelligence, 1997
- Applications of the radius-diameter diagram to the classification of topological and geometrical shapes of chemical compoundsJournal of Chemical Information and Computer Sciences, 1992
- The δ-corrected Kolmogorov-Smirnov test for goodness of fitJournal of Statistical Planning and Inference, 1990
- Automatic generation of 3D-atomic coordinates for organic moleculesTetrahedron Computer Methodology, 1990
- Highly discriminating distance-based topological indexChemical Physics Letters, 1982
- Chemical graphsTheoretical Chemistry Accounts, 1979