SVM-Based Feature Selection for Characterization of Focused Compound Collections

Abstract
Artificial neural networks, the support vector machine (SVM), and other machine learning methods for the classification of molecules are often considered as a “black box”, since the molecular features that are most relevant for a given classifier are usually not presented in a human-interpretable form. We report on an SVM-based algorithm for the selection of relevant molecular features from a trained classifier that might be important for an understanding of ligand−receptor interactions. The original SVM approach was extended to allow for feature selection. The method was applied to characterize focused libraries of enzyme inhibitors. A comparison with classical Kolmogorov-Smirnov (KS)-based feature selection was performed. In most of the applications the SVM method showed sustained classification accuracy, thereby relying on a smaller number of molecular features than KS-based classifiers. In one case both methods produced comparable results. Limiting the calculation of descriptors to only the most relevant ones for a certain biological activity can also be used to speed up high-throughput virtual screening.

This publication has 19 references indexed in Scilit: