Selecting Optimally Diverse Compounds from Structure Databases: A Validation Study of Two-Dimensional and Three-Dimensional Molecular Descriptors
- 1 April 1997
- journal article
- research article
- Published by American Chemical Society (ACS) in Journal of Medicinal Chemistry
- Vol. 40 (8) , 1219-1229
- https://doi.org/10.1021/jm960352+
Abstract
The efficiency of the drug discovery process can be significantly improved using design techniques to maximize the diversity of structure databases or combinatorial libraries. Here, several physicochemical descriptors were investigated to quantify molecular diversity. Based on the 2D or 3D topological similarity of molecules, the relationship between physicochemical metrics and biological activity was studied to find valid descriptors. Several compounds were selected using those descriptors from a database containing diverse templates and 55 biological classes. It was evaluated whether the obtained subsets represent all biological properties and structural variations of the original database. In addition, hierarchical cluster analyses were used to group molecules from the parent database, which should have similar biological properties. Using various sets of structurally similar molecules, it was possible to derive quantitative measures for compound similarities in relation to biological properties. A similarity radius for 2D fingerprints and molecular steric fields was estimated; compounds within this radius of another molecule were shown to have comparable biological properties. This study demonstrates that 2D fingerprints alone or in combination with other metrics as the primary descriptor allow to handle global diversity. In addition, standard atom-pair descriptors or molecular steric fields can be used to correlate structural diversity with biological activity. Hence, the latter two descriptors can be classified as secondary descriptors useful for analog library design, while 2D fingerprints are applicable to design a general library for lead discovery. Based on these findings, an optimally diverse subset containing only 38% of the entire IC93 database was generated using 2D fingerprints. Here no structure is more similar than 0.85 to any other (Tanimoto coefficient), but all biological classes were selected. This reduction of redundancy led to a child database with the same physicochemical diversity space, which contains the same information as the original database.Keywords
This publication has 25 references indexed in Scilit:
- Designing Chemical Libraries for Lead DiscoverySLAS Discovery, 1996
- 3D-Quantitative Structure-Activity Relationships of Human Immunodeficiency Virus Type-1 Proteinase Inhibitors: Comparative Molecular Field Analysis of 2-Heterosubstituted Statine Derivatives-Implications for the Design of Novel InhibitorsJournal of Medicinal Chemistry, 1995
- Simulation Analysis of Experimental Design Strategies for Screening Random Compounds as Potential New Drugs and AgrochemicalsJournal of Chemical Information and Computer Sciences, 1995
- A PLS kernel algorithm for data sets with many variables and fewer objects. Part 1: Theory and algorithmJournal of Chemometrics, 1994
- Clustering of chemical structures on the basis of two-dimensional similarity measuresJournal of Chemical Information and Computer Sciences, 1992
- A comment on nomenclature and the unsaturated bondJournal of Chemical Information and Computer Sciences, 1991
- Comparative molecular field analysis (CoMFA). 2. Toward its use with 3D-structural databasesTetrahedron Computer Methodology, 1990
- Quantitative Structure-Activity Relationship (QSAR) Studies Using Electronic Descriptors Calculated from Topological and Molecular Orbital (MO) MethodsQuantitative Structure-Activity Relationships, 1990
- Atomic Physicochemical Parameters for Three‐Dimensional Structure‐Directed Quantitative Structure‐Activity Relationships I. Partition Coefficients as a Measure of HydrophobicityJournal of Computational Chemistry, 1986
- A Comparison of Some Measures for the Determination of Inter‐Molecular Structural Similarity Measures of Inter‐Molecular Structural SimilarityQuantitative Structure-Activity Relationships, 1986