On Combining Recursive Partitioning and Simulated Annealing To Detect Groups of Biologically Active Compounds
- 5 February 2002
- journal article
- Published by American Chemical Society (ACS) in Journal of Chemical Information and Computer Sciences
- Vol. 42 (2) , 393-404
- https://doi.org/10.1021/ci0101049
Abstract
Statistical data mining methods have proven to be powerful tools for investigating correlations between molecular structure and biological activity. Recursive partitioning (RP), in particular, offers several advantages in mining large, diverse data sets resulting from high throughput screening. When used with binary molecular descriptors, the standard implementation of RP splits on single descriptors. We use simulated annealing (SA) to find combinations of molecular descriptors whose simultaneous presence best separates off the most active, chemically similar group of compounds. The search is incorporated into a recursive partitioning design to produce a regression tree for biological activity on the space of structural fingerprints. Each node is characterized by a specific combination of structural features, and the terminal nodes with high average activities correspond, roughly, to different classes of compounds. Using LeadScope structural features as descriptors to mine a database from the National Cancer Institute, the merging of RP and SA consistently identifies structurally homogeneous classes of highly potent anticancer agents.Keywords
This publication has 11 references indexed in Scilit:
- Results of a New Classification Algorithm Combining K Nearest Neighbors and Recursive PartitioningJournal of Chemical Information and Computer Sciences, 2000
- A Novel Method for Building Regression Tree Models for QSAR Based on Artificial Ant Colony SystemsJournal of Chemical Information and Computer Sciences, 2000
- LeadScope: Software for Exploring Large Sets of Screening DataJournal of Chemical Information and Computer Sciences, 2000
- Analysis of a Large Structure/Biological Activity Data Set Using Recursive PartitioningJournal of Chemical Information and Computer Sciences, 1999
- Automated Pharmacophore Identification for Large Chemical Data SetsJournal of Chemical Information and Computer Sciences, 1999
- Rational Combinatorial Library Design. 3. Simulated Annealing Guided Evaluation (SAGE) of Molecular Diversity: A Novel Computational Tool for Universal Library Design and Database MiningJournal of Chemical Information and Computer Sciences, 1999
- Ant colony system: a cooperative learning approach to the traveling salesman problemIEEE Transactions on Evolutionary Computation, 1997
- Analysis of a Large Structure‐Activity Data Set Using Recursive PartitioningQuantitative Structure-Activity Relationships, 1997
- Optimization by Simulated AnnealingScience, 1983
- AUTOMATIC INTERACTION DETECTIONPublished by Cambridge University Press (CUP) ,1982