Novel Variable Selection Quantitative Structure−Property Relationship Approach Based on thek-Nearest-Neighbor Principle

Abstract
A novel automated variable selection quantitative structure−activity relationship (QSAR) method, based on the K-nearest neighbor principle (kNN-QSAR) has been developed. The kNN-QSAR method explores formally the active analogue approach, which implies that similar compounds display similar profiles of pharmacological activities. The activity of each compound is predicted as the average activity of K most chemically similar compounds from the data set. The robustness of a QSAR model is characterized by the value of cross-validated R2 (q2) using the leave-one-out cross-validation method. The chemical structures are characterized by multiple topological descriptors such as molecular connectivity indices or atom pairs. The chemical similarity is evaluated by Euclidean distances between compounds in multidimensional descriptor space, and the optimal subset of descriptors is selected using simulated annealing as a stochastic optimization algorithm. The application of the kNN-QSAR method to 58 estrogen receptor ligands as well as to several other groups of pharmacologically active compounds yielded QSAR models with q2 values of 0.6 or higher. Due to its relative simplicity, high degree of automation, nonlinear nature, and computational efficiency, this method could be applied routinely to a large variety of experimental data sets.