The Hidden Component of Size in Two-Dimensional Fragment Descriptors: Side Effects on Sampling in Bioactive Libraries
- 1 July 1999
- journal article
- Published by American Chemical Society (ACS) in Journal of Medicinal Chemistry
- Vol. 42 (15) , 2887-2900
- https://doi.org/10.1021/jm980708c
Abstract
We have carried out a number of sampling experiments in libraries of bioactive compounds to illustrate how size biases introduced by two-dimensional (2D) fragment distance functions may provide misleading information about the diversity of compound subsets. The number of different biological targets covered by a given subset is used as a measure of bioactive diversity, and it is considered to be the relevant property with which 2D diversity should correlate. Since the nature of the size biases depends on the way in which 2D distance is computed, we investigated three different methods of calculating distance. Use of 1-Tanimoto as a dissimilarity measure leads to the spurious conclusion that collections of structurally small compounds are inherently more diverse than other collections which may cover a broader range of sizes and more biological targets. XOR or squared Euclidean distance, by contrast, shows a preference for subsets of structurally larger compounds, but this does not appear to have as many adverse consequences in terms of target coverage. A simple product of 1-Tanimoto and XOR tends to equalize the opposing size effects of the two component distance functions and leads to a relatively unbiased means of comparing structures. Results here suggest that careful consideration should be given to the way in which chemical structures are compared whenever 2D fragment descriptors are used.Keywords
This publication has 15 references indexed in Scilit:
- Can We Learn To Distinguish between “Drug-like” and “Nondrug-like” Molecules?Journal of Medicinal Chemistry, 1998
- On the Properties of Bit String-Based Measures of Chemical SimilarityJournal of Chemical Information and Computer Sciences, 1998
- Random or Rational Design? Evaluation of Diverse Compound Subsets from Chemical Structure DatabasesJournal of Medicinal Chemistry, 1998
- Selecting Optimally Diverse Compounds from Structure Databases: A Validation Study of Two-Dimensional and Three-Dimensional Molecular DescriptorsJournal of Medicinal Chemistry, 1997
- The Information Content of 2D and 3D Structural Descriptors Relevant to Ligand-Receptor BindingJournal of Chemical Information and Computer Sciences, 1997
- Designing Chemical Libraries for Lead DiscoverySLAS Discovery, 1996
- Use of Structure−Activity Data To Compare Structure-Based Clustering Methods and Descriptors for Use in Compound SelectionJournal of Chemical Information and Computer Sciences, 1996
- Neighborhood Behavior: A Useful Concept for Validation of “Molecular Diversity” DescriptorsJournal of Medicinal Chemistry, 1996
- A Comparison of Some Measures for the Determination of Inter‐Molecular Structural Similarity Measures of Inter‐Molecular Structural SimilarityQuantitative Structure-Activity Relationships, 1986
- Artificial intelligence approach to structure-activity studies. Computer automated structure evaluation of biological activity of organic moleculesJournal of the American Chemical Society, 1984