Sequential Approach for Identifying Lead Compounds in Large Chemical Databases
Open Access
- 1 May 2001
- journal article
- Published by Institute of Mathematical Statistics in Statistical Science
- Vol. 16 (2) , 154-168
- https://doi.org/10.1214/ss/1009213288
Abstract
At the early stage of drug discovery, many thousands of chemical compounds can be synthesized and tested (assayed) for potency (activity) with high throughput screening (HTS). With ever-increasing numbers of compounds to be tested (now often in the neighborhood of 500,000) it remains a challenge to find strategies via sequential design that reduce costs while locating classes of active compounds. Initial screening of a modest number of selected compounds (first-stage) is used to construct a structure-activity relationship (SAR). Based on this model, a second-stage sample is selected, the SAR updated and, if no more sampling is done, the activities of not yet tested compounds are predicted. Instead of stopping, the SAR could be used to determine another stage of sampling after which the SAR is updated and the process repeated. We use existing data on the potency and chemical structure of 70,223 compounds to investigate various sequential testing schemes. Evidence on two assays supports the conclusion that a rather small number of samples selected according to the proposed scheme can more than triple the rate at which active compounds are identified and also produce SARs effective for identifying chemical structure. A different set of 52,883 compounds is used to confirm our findings. One surprising conclusion of the study is that the design of the initial sample stage may be unimportant: random selection or systematic methods based on chemical structures are equally effective.Keywords
This publication has 19 references indexed in Scilit:
- Minimax and maximin distance designsPublished by Elsevier ,2002
- Binary Formal Inference-Based Recursive Modeling Using Multiple Atom and Physicochemical Property Class Pair and Torsion Descriptors as Decision CriteriaJournal of Chemical Information and Computer Sciences, 2000
- Miniaturization technologies in HTS: how fast, how small, how soon?Drug Discovery Today, 1998
- Approaches to virtual library designDrug Discovery Today, 1998
- Virtual screening—an overviewDrug Discovery Today, 1998
- Computer-based screening of compound databases for the identification of novel leadsDrug Discovery Today, 1996
- Combinatorial Chemistry Hits the Drug MarketScience, 1996
- Molecular identification number for substructure searchesJournal of Chemical Information and Computer Sciences, 1989
- Atom pairs as molecular features in structure-activity studies: definition and applicationsJournal of Chemical Information and Computer Sciences, 1985
- AUTOMATIC INTERACTION DETECTIONPublished by Cambridge University Press (CUP) ,1982