Evaluating Virtual Screening Methods: Good and Bad Metrics for the “Early Recognition” Problem

Top Cited Papers

9 February 2007

journal article
research article
Published by American Chemical Society (ACS) in Journal of Chemical Information and Modeling

Vol. 47 (2) , 488-508
https://doi.org/10.1021/ci600426e

Abstract

Many metrics are currently used to evaluate the performance of ranking methods in virtual screening (VS), for instance, the area under the receiver operating characteristic curve (ROC), the area under the accumulation curve (AUAC), the average rank of actives, the enrichment factor (EF), and the robust initial enhancement (RIE) proposed by Sheridan et al. In this work, we show that the ROC, the AUAC, and the average rank metrics have the same inappropriate behaviors that make them poor metrics for comparing VS methods whose purpose is to rank actives early in an ordered list (the “early recognition problem”). In doing so, we derive mathematical formulas that relate those metrics together. Moreover, we show that the EF metric is not sensitive to ranking performance before and after the cutoff. Instead, we formally generalize the ROC metric to the early recognition problem which leads us to propose a novel metric called the Boltzmann-enhanced discrimination of receiver operating characteristic that turns out to contain the discrimination power of the RIE metric but incorporates the statistical significance from ROC and its well-behaved boundaries. Finally, two major sources of errors, namely, the statistical error and the “saturation effects”, are examined. This leads to practical recommendations for the number of actives, the number of inactives, and the “early recognition” importance parameter that one should use when comparing ranking methods. Although this work is applied specifically to VS, it is general and can be used to analyze any method that needs to segregate actives toward the front of a rank-ordered list.

Keywords

This publication has 16 references indexed in Scilit:

Cost curves: An improved method for visualizing classifier performance
Machine Learning, 2006
Robust Ligand-Based Modeling of the Biological Targets of Known Drugs
Journal of Medicinal Chemistry, 2006
Screening Drug-Like Compounds by Docking to Homology Models: A Systematic Study
Journal of Chemical Information and Modeling, 2005
Virtual Screening Workflow Development Guided by the “Receiver Operating Characteristic” Curve Approach. Application to High-Throughput Docking on Metabotropic Glutamate Receptor Subtype 4
Journal of Medicinal Chemistry, 2005
Assessing Scoring Functions for Protein−Ligand Interactions
Journal of Medicinal Chemistry, 2004
A detailed comparison of current docking and scoring methods on systems of pharmaceutical relevance
Proteins-Structure Function and Bioinformatics, 2004
Virtual Screening for Kinase Targets
Current Medicinal Chemistry, 2004
Glide: A New Approach for Rapid, Accurate Docking and Scoring. 2. Enrichment Factors in Database Screening
Journal of Medicinal Chemistry, 2004
Binding site characteristics in structure-based virtual screening: evaluation of current docking tools
Journal of Molecular Modeling, 2003
Effectiveness of retrieval in similarity searches of chemical databases: a review of performance measures
Journal of Molecular Graphics and Modelling, 2000