Evaluation of Similarity Measures for Searching the Dictionary of Natural Products Database
- 28 January 2003
- journal article
- research article
- Published by American Chemical Society (ACS) in Journal of Chemical Information and Computer Sciences
- Vol. 43 (2) , 449-457
- https://doi.org/10.1021/ci025591m
Abstract
Similarity searches using combinations of seven different similarity coefficients and six different representations have been carried out on the Dictionary of Natural Products database. The objective was to discover if any special methods of searching apply to this database, which is very different in nature from the many synthetic databases that have been the subject of previous studies of similarity searching. Search effectiveness was assessed by a recall analysis of the search outputs from sets of pharmacologically active target structures. The different target sets produce exceptional but contradictory results for the Russell-Rao and Forbes coefficients, which have been shown to be due to a dependence on molecular size; these are the coefficients of choice in the case of large and small structures, respectively. Rankings from these results have been combined using a data fusion scheme and some small gains in performance were normally obtained by using substructural fingerprints and molecular holograms in combination with the Squared Euclidean or Tanimoto coefficients.Keywords
This publication has 14 references indexed in Scilit:
- Protocols for Bridging the Peptide to Nonpeptide Gap in Topological Similarity SearchesJournal of Chemical Information and Computer Sciences, 2001
- Comparison of the NCI Open Database with Seven Large Chemical Structural DatabasesJournal of Chemical Information and Computer Sciences, 2001
- Effectiveness of retrieval in similarity searches of chemical databases: a review of performance measuresJournal of Molecular Graphics and Modelling, 2000
- Distinguishing between Natural Products and Synthetic Molecules by Descriptor Shannon Entropy Analysis and Binary QSAR CalculationsJournal of Chemical Information and Computer Sciences, 2000
- Strategies for discovering drugs from previously unexplored natural productsDrug Discovery Today, 2000
- Consensus Scoring: A Method for Obtaining Improved Hit Rates from Docking Databases of Three-Dimensional Structures into ProteinsJournal of Medicinal Chemistry, 1999
- Statistical Investigation into the Structural Complementarity of Natural Products and Synthetic CompoundsAngewandte Chemie International Edition in English, 1999
- Chemical Similarity SearchingJournal of Chemical Information and Computer Sciences, 1998
- Virtual screening—an overviewDrug Discovery Today, 1998
- The Coding of the Three-Dimensional Structure of Molecules by Molecular Transforms and Its Application to Structure-Spectra Correlations and Studies of Biological ActivityJournal of Chemical Information and Computer Sciences, 1996