Genetic Algorithm Guided Selection: Variable Selection and Subset Selection
- 23 May 2002
- journal article
- research article
- Published by American Chemical Society (ACS) in Journal of Chemical Information and Computer Sciences
- Vol. 42 (4) , 927-936
- https://doi.org/10.1021/ci010247v
Abstract
A novel Genetic Algorithm guided Selection method, GAS, has been described. The method utilizes a simple encoding scheme which can represent both compounds and variables used to construct a QSAR/QSPR model. A genetic algorithm is then utilized to simultaneously optimize the encoded variables that include both descriptors and compound subsets. The GAS method generates multiple models each applying to a subset of the compounds. Typically the subsets represent clusters with different chemotypes. Also a procedure based on molecular similarity is presented to determine which model should be applied to a given test set compound. The variable selection method implemented in GAS has been tested and compared using the Selwood data set (n = 31 compounds; v = 53 descriptors). The results showed that the method is comparable to other published methods. The subset selection method implemented in GAS has been first tested using an artificial data set (n = 100 points; v = 1 descriptor) to examine its ability to subset data points and second applied to analyze the XLOGP data set (n = 1831 compounds; v = 126 descriptors). The method is able to correctly identify artificial data points belonging to various subsets. The analysis of the XLOGP data set shows that the subset selection method can be useful in improving a QSAR/QSPR model when the variable selection method fails.Keywords
This publication has 13 references indexed in Scilit:
- Development and Validation of a Novel Variable Selection Technique with Application to Multidimensional Quantitative Structure−Activity Relationship StudiesJournal of Chemical Information and Computer Sciences, 1999
- Comparison of Reliability of log P Values for Drugs Calculated by Several Methods.CHEMICAL & PHARMACEUTICAL BULLETIN, 1994
- Variable Selection in QSAR Studies. II. A Highly Efficient Combination of Systematic Search and EvolutionQuantitative Structure-Activity Relationships, 1994
- On Identifying Likely Determinants of Biological Activity in High Dimensional QSAR ProblemsQuantitative Structure-Activity Relationships, 1994
- Genetic Algorithms: Principles of Natural Selection Applied to ComputationScience, 1993
- Genetic AlgorithmsScientific American, 1992
- Structure-activity relationships of antifilarial antimycin analogs: a multivariate pattern recognition studyJournal of Medicinal Chemistry, 1990
- Atom pairs as molecular features in structure-activity studies: definition and applicationsJournal of Chemical Information and Computer Sciences, 1985
- Deoxyribonucleoside-3′,5′ Cyclic Phosphates. Synthesis and Acid-Catalyzed and Enzymic HydrolysisJournal of the American Chemical Society, 1964
- The Correlation of Biological Activity of Plant Growth Regulators and Chloromycetin Derivatives with Hammett Constants and Partition CoefficientsJournal of the American Chemical Society, 1963