Enhancing the Effectiveness of Virtual Screening by Fusing Nearest Neighbor Lists: A Comparison of Similarity Coefficients
- 1 September 2004
- journal article
- research article
- Published by American Chemical Society (ACS) in Journal of Chemical Information and Computer Sciences
- Vol. 44 (5) , 1840-1848
- https://doi.org/10.1021/ci049867x
Abstract
This paper evaluates the effectiveness of various similarity coefficients for 2D similarity searching when multiple bioactive target structures are available. Similarity searches using several different activity classes within the MDL Drug Data Report and the Dictionary of Natural Products databases are performed using BCI 2D fingerprints. Using data fusion techniques to combine the resulting nearest neighbor lists we obtain group recall results which, in many cases, are a considerable improvement on standard average recall values obtained for individual structures. It is shown that the degree of improvement can be related to the structural diversity of the activity class that is searched for, the best results being found for the most diverse groups. The group recall of active compounds using subsets of the class is also investigated: for highly self-similar activity classes, the group recall improvement saturates well before the full activity class size is reached. A rough correlation is found between the relative improvement using the group recall and the square of the number of unique compounds available in all of the merged lists. The Tanimoto coefficient is found unambiguously to be the best coefficient to use for the recovery of active compounds using multiple targets. Furthermore, when using the Tanimoto coefficient, the "MAX" fusion rule is found to be more effective than the "SUM" rule for the combination of similarity searches from multiple targets. The use of group recall can lead to improved enrichment in database searches and virtual screening.Keywords
This publication has 10 references indexed in Scilit:
- Evaluation of Similarity Measures for Searching the Dictionary of Natural Products DatabaseJournal of Chemical Information and Computer Sciences, 2003
- Combination of Fingerprint-Based Similarity Coefficients Using Data FusionJournal of Chemical Information and Computer Sciences, 2002
- Similarity Metrics for Ligands Reflecting the Similarity of the Target ProteinsJournal of Chemical Information and Computer Sciences, 2002
- Comparison of Ranking Methods for Virtual Screening in Lead-Discovery ProgramsJournal of Chemical Information and Computer Sciences, 2002
- The Importance of Scaling in Data Mining for Toxicity PredictionJournal of Chemical Information and Computer Sciences, 2002
- Data Fusion by Intelligent Classifier CombinationMeasurement and Control, 2001
- How Does Consensus Scoring Work for Virtual Library Screening? An Idealized Computer ExperimentJournal of Chemical Information and Computer Sciences, 2001
- Effectiveness of retrieval in similarity searches of chemical databases: a review of performance measuresJournal of Molecular Graphics and Modelling, 2000
- Some terms of reference in data fusionIEEE Transactions on Geoscience and Remote Sensing, 1999
- Chemical Similarity SearchingJournal of Chemical Information and Computer Sciences, 1998