Assessment of diagnostic markers by goodness‐of‐fit tests

Abstract
Receiver operating characteristic (ROC) curves are useful statistical tools used to assess the precision of diagnostic markers or to compare new diagnostic markers with old ones. The most common index employed for these purposes is the area under the ROC curve (θ) and several statistical tests exist that test the null hypotheses H0: θ= 0.5 or H0: θ1=θ2, in the case of two‐marker comparisons, against alternatives of interest. In this paper we show that goodness‐of‐fit of uniformity of the distribution of the false positive (true positive) rates can be used instead of tests based on the area index. A semi‐parametric approach is based on a completely specified distribution of marker measurements for either the healthy (F) or diseased (G) subjects, and this is extended to the two‐marker case. We then extend to the one‐ and two‐marker case when neither distribution is specified (the non‐parametric case). In general, ROC‐based tests are more powerful than goodness‐of‐fit tests for location differences between the distributions of healthy and diseased subjects. However ROC‐based tests are less powerful when location‐scale differences exist (producing ROC curves that cross the diagonal) and are incapable of discriminating between healthy and diseased samples when θ=0.5 but F ≠ G. In these cases, goodness‐of‐fit tests have a distinct advantage over ROC‐based tests. In conclusion, ROC methodology should be used with recognition of its potential limitations and should be replaced by goodness‐of‐fit tests when appropriate. The latter are a viable alternative and can be used as a ‘black box’ or as an exploratory first step in the evaluation of novel diagnostic markers. Copyright © 2003 John Wiley & Sons, Ltd.