An Information‐Theoretic Approach to Descriptor Selection for Database Profiling and QSAR Modeling
- 1 July 2003
- journal article
- review article
- Published by Wiley in QSAR & Combinatorial Science
- Vol. 22 (5) , 487-497
- https://doi.org/10.1002/qsar.200310001
Abstract
In order to rationalize the selection of molecular descriptors for QSAR and other applications, we have adapted the Shannon entropy concept that was originally developed in digital communication theory. The approach has been extended to facilitate the large‐scale analysis of molecular descriptors and their information content in diverse compound databases. This has enabled us to identify descriptors with consistently high information content. Furthermore, it has been possible to select descriptors that are sensitive to systematic property differences in diverse compound collections (synthetic compounds, natural products, drug‐like molecules, or drugs) and, in addition, to quantify such database‐specific differences. Selection of descriptors based on information content has been proven useful for binary QSAR analysis. In this review, we describe the principles of entropy‐based descriptor selection and discuss different applications.Keywords
This publication has 24 references indexed in Scilit:
- Scaffold Architecture and Pharmacophoric Properties of Natural Products and Trade Drugs: Application in the Design of Natural Product-Based Combinatorial LibrariesJournal of Combinatorial Chemistry, 2001
- Estimation of the Aqueous Solubility of Organic Molecules by the Group Contribution ApproachJournal of Chemical Information and Computer Sciences, 2001
- A widely applicable set of descriptorsPublished by Elsevier ,2000
- Design of Array-Type Compound Libraries that Combine Information from Natural Products and Synthetic MoleculesJournal of Molecular Modeling, 2000
- Statistical Investigation into the Structural Complementarity of Natural Products and Synthetic CompoundsAngewandte Chemie International Edition in English, 1999
- Pharmacophoric pattern matching in files of three-dimensional chemical structures: Characterization and use of generalized valence angle screensJournal of Molecular Graphics, 1991
- Selection of screens for three-dimensional substructure searchingTetrahedron Computer Methodology, 1990
- Highly discriminating distance-based topological indexChemical Physics Letters, 1982
- Iterative partial equalization of orbital electronegativity—a rapid access to atomic chargesTetrahedron, 1980
- Chemical graphsTheoretical Chemistry Accounts, 1979