Data Shaving: A Focused Screening Approach

23 January 2004

journal article
Published by American Chemical Society (ACS) in Journal of Chemical Information and Computer Sciences

Vol. 44 (2) , 470-479
https://doi.org/10.1021/ci030025s

Abstract

The number of compounds available for evaluation as part of the drug discovery process continues to increase. These compounds may exist physically or be stored electronically allowing screening by either actual or virtual means. This growing number of compounds has generated an increasing need for effective strategies to direct screening efforts. Initial efforts toward this goal led to the development of methods to select diverse sets of compounds for screening, methods to cluster actives into related groups of compounds, and tools to select compounds similar to actives of interest for further screening. In this work we extend these earlier efforts to exploit information about inactive compounds to help make rational decisions about which sets of compounds to include as part of a continuing screening campaign, or as part of a focused follow-up effort. This method uses the information from inactive compounds to "shave" off or deprioritize compounds similar to inactives from further consideration. This methodology can be used in two ways: first, to provide a rational means of deciding when sufficient compounds containing certain structural features have been tested and second as a tool to enhance similarity searching around known actives. Similarity searching is improved by deprioritizing compounds predicted to be inactive, due to the presence of structural features associated with inactivity.

Keywords

This publication has 21 references indexed in Scilit:

Generation and Display of Activity-Weighted Chemical Hyperstructures
Journal of Chemical Information and Computer Sciences, 2003
An Efficient Implementation of a Drug Candidate Database
Journal of Chemical Information and Computer Sciences, 2002
Pattern recognition and massively distributed computing
Journal of Computational Chemistry, 2002
Analysis of Large Screening Data Sets via Adaptively Grown Phylogenetic-Like Trees
Journal of Chemical Information and Computer Sciences, 2002
LeadScope: Software for Exploring Large Sets of Screening Data
Journal of Chemical Information and Computer Sciences, 2000
Design and Diversity Analysis of Large Combinatorial Libraries Using Cell-Based Methods
Journal of Chemical Information and Computer Sciences, 1999
Metric Validation and the Receptor-Relevant Subspace Concept
Journal of Chemical Information and Computer Sciences, 1999
Neighborhood Behavior: A Useful Concept for Validation of “Molecular Diversity” Descriptors
Journal of Medicinal Chemistry, 1996
Algorithm5: A Technique for Fuzzy Similarity Clustering of Chemical Inventories
Journal of Chemical Information and Computer Sciences, 1996
Molecular substructure similarity searching: efficient retrieval in two-dimensional structure databases
Journal of Chemical Information and Computer Sciences, 1992