VSMP: A Novel Variable Selection and Modeling Method Based on the Prediction
- 4 April 2003
- journal article
- Published by American Chemical Society (ACS) in Journal of Chemical Information and Computer Sciences
- Vol. 43 (3) , 964-969
- https://doi.org/10.1021/ci020377j
Abstract
The use of numerous descriptors that are indicative of molecular structure and topology is becoming more common in quantitative structure-activity relationship (QSAR). How to choose the adequate descriptors for QSAR studies is important but difficult because there are no absolute rules to govern this choice. A variety of variable selection techniques including stepwise, partial least squares/principal component analysis (PLS/PCA), neural network, and evolutionary algorithm such as genetic algorithm have been applied to this common problem. All-subsets regression (ASR) is capable of finding out the best variable subset from among a large pool. In this paper, a novel variable selection and modeling method based on the prediction, for short VSMP, has been developed. Here two controllable parameters, the interrelation coefficient between the pairs of the independent variables (r(int)) and the correlation coefficient (q(2)) obtained using the leave-one-out (LOO) cross-validation technique, are introduced into the ASR to improve its performances. This technique differs from the other variable selection procedures related to the ASR by two main features: (1) The search of various optimal subset search is controlled by the statistic q(2) or root-mean-square error (RMSEP) in the LOO cross-validation step rather than the correlation coefficient obtained in the modeling step (r(2)). (2) The searching speed of all optimal subsets is expedited by the statistic r(int) together with q(2). A comparison of the results of the VSMP applied to the Selwood data set (n = 31 compounds, m = 53 descriptors) with those obtained from alternative algorithms shows the good performance of the technique.Keywords
This publication has 16 references indexed in Scilit:
- Robust stepwise regressionJournal of Applied Statistics, 2002
- Variable selection in classification of environmental soil samples for partial least square and neural network modelsAnalytica Chimica Acta, 2001
- Stepwise Selection in Small Data SetsPublished by Elsevier ,1999
- Some Risks in the Construction and Analysis of Supersaturated DesignsTechnometrics, 1999
- Development and Validation of a Novel Variable Selection Technique with Application to Multidimensional Quantitative Structure−Activity Relationship StudiesJournal of Chemical Information and Computer Sciences, 1999
- Self-Organizing Molecular Field Analysis: A Tool for Structure−Activity StudiesJournal of Medicinal Chemistry, 1999
- Neural Network Studies. 3. Variable Selection in the Cascade-Correlation Learning ArchitectureJournal of Chemical Information and Computer Sciences, 1998
- A quantitative structure‐activity relationship study of some substance P‐related peptides A multivariate approach using PLS and variable selectionChemical Biology & Drug Design, 1997
- Genetic algorithms as a strategy for feature selectionJournal of Chemometrics, 1992
- Comparative molecular field analysis (CoMFA). 1. Effect of shape on binding of steroids to carrier proteinsJournal of the American Chemical Society, 1988