Approaches to Measure Chemical Similarity – a Review
Top Cited Papers
- 1 December 2003
- journal article
- review article
- Published by Wiley in QSAR & Combinatorial Science
- Vol. 22 (9-10) , 1006-1026
- https://doi.org/10.1002/qsar.200330831
Abstract
Although the concept of similarity is a convenient for humans, a formal definition of similarity between chemical compounds is needed to enable automatic decision‐making. The objective of similarity measures in toxicology and drug design is to allow assessment of chemical activities. The ideal similarity measure should be relevant to the activity of interest. The relevance could be established by exploiting the knowledge about fundamental chemical and biological processes responsible for the activity. Unfortunately, this knowledge is rarely available and therefore different approximations have been developed based on similarity between structures or descriptor values. Various methods are reviewed, ranging from two‐dimensional, three‐dimensional and field approaches to recent methods based on “Atoms in Molecules” theory. All these methods attempt to describe chemical compounds by a set of numerical values and define some means for comparison between them. The review provides analysis of potential pitfalls of this methodology – loss of information in the representations of molecular structures – the relevance of a particular representation and chosen similarity measure to the activity. A brief review of known methods for descriptor selection is also provided. The popular “neighborhood behavior” principle is criticized, since proximity with respect to descriptors does not necessarily mean proximity with respect to activity. Structural similarity should also be used with care, as it does not always imply similar activity, as shown by examples. We remind that similarity measures and classification techniques based on distances rely on certain data distribution assumptions. If these assumptions are not satisfied for a given dataset, the results could be misleading. A discussion on similarity in descriptor space in the context of applicability domain assessment of QSAR models is also provided. Finally, it is shown that descriptor based similarity analysis is prone to errors if the relationship between the activity and the descriptors has not been previously established. A justification for the usage of a particular similarity measure should be provided for every specific activity by expert knowledge or derived by data modeling techniques.Keywords
This publication has 80 references indexed in Scilit:
- Quantum topological molecular similarity. Part 4.See ref. 7 for Part 3. A QSAR study of cell growth inhibitory properties of substituted (E)-1-phenylbut-1-en-3-onesJournal of the Chemical Society, Perkin Transactions 2, 2002
- New QSAR Methods Applied to Structure−Activity Mapping and Combinatorial ChemistryJournal of Chemical Information and Computer Sciences, 1998
- Topological Index and Thermodynamic Properties. 5. How Can We Explain the Topological Dependency of Thermodynamic Properties of Alkanes with the Topology of Graphs?Journal of Chemical Information and Computer Sciences, 1998
- Virtual Compound Libraries: A New Approach to Decision Making in Molecular Discovery ResearchJournal of Chemical Information and Computer Sciences, 1998
- Neighborhood Behavior: A Useful Concept for Validation of “Molecular Diversity” DescriptorsJournal of Medicinal Chemistry, 1996
- Rapid evaluation of shape similarity using Gaussian functionsJournal of Chemical Information and Computer Sciences, 1993
- A semi-empirical method for calculating molecular similarityJournal of the Chemical Society, Chemical Communications, 1986
- The General Nature of the Proportionality of Polar Effects of Substituent Groups in Organic ChemistryJournal of the American Chemical Society, 1953
- Polar and Steric Substituent Constants for Aliphatic and o-Benzoate Groups from Rates of Esterification and Hydrolysis of Esters1Journal of the American Chemical Society, 1952
- Structural Determination of Paraffin Boiling PointsJournal of the American Chemical Society, 1947