Deriving Knowledge through Data Mining High-Throughput Screening Data
- 20 October 2004
- journal article
- research article
- Published by American Chemical Society (ACS) in Journal of Medicinal Chemistry
- Vol. 47 (25) , 6373-6383
- https://doi.org/10.1021/jm049902r
Abstract
Deriving general knowledge from high-throughput screening data is made difficult by the significant amount of noise, arising primarily from false positives, in the data. The paradigm established for screening an encoded combinatorial library on polymeric support, an ECLiPS library, has a significant amount of built-in redundancy. Because of this redundancy, the resulting data can be interpreted through a rigorous statistical analysis procedure, thereby significantly reducing the number of false positives. Here, we develop the statistical models used to analyze data from high-throughput screens of ECLiPS libraries to derive unbiased true hit rates. These hit rates can also be calculated on subsets of the collection such as those compounds containing a carboxylic acid or those with molecular weight below 350 Da. The relative value of the hit rate on the subset of the collection can then be compared to the overall hit rate to determine the effect of the substructure or physical property on the likelihood of a molecule having biological activity. Here, we show the effects that various functional groups and the standard physical properties, molecular weight, hydrogen bond donors, hydrogen bond acceptors, log P, and rotatable bonds, have on the likelihood of a compound being biologically active. To our knowledge this is the first published account of the use of high-throughput screening data to elucidate the effects of physical properties and substructures on the likelihood of compounds showing biological activity over a broad range of pharmaceutically relevant targets.Keywords
This publication has 22 references indexed in Scilit:
- Characteristic Physical Properties and Structural Fragments of Marketed Oral DrugsJournal of Medicinal Chemistry, 2003
- Molecular Properties That Influence the Oral Bioavailability of Drug CandidatesJournal of Medicinal Chemistry, 2002
- Property distribution of drug-related chemical databases*Journal of Computer-Aided Molecular Design, 2000
- Predicting Human Oral Bioavailability of a Compound: Development of a Novel Quantitative Structure-Bioavailability RelationshipPharmaceutical Research, 2000
- Properties of Known Drugs. 2. Side ChainsJournal of Medicinal Chemistry, 1999
- Identification of Biological Activity Profiles Using Substructural Analysis and Genetic AlgorithmsJournal of Chemical Information and Computer Sciences, 1998
- High-throughput screening of historic collections: Observations on file size, biological targets, and file diversityBiotechnology & Bioengineering, 1998
- Experimental and computational approaches to estimate solubility and permeability in drug discovery and development settingsAdvanced Drug Delivery Reviews, 1997
- The Properties of Known Drugs. 1. Molecular FrameworksJournal of Medicinal Chemistry, 1996
- Complex synthetic chemical libraries indexed with molecular tags.Proceedings of the National Academy of Sciences, 1993