Borrowing Information Across Populations in Estimating Positive and Negative Predictive Values

Abstract
Summary: A marker’s capacity to predict the risk of a disease depends on the prevalence of disease in the target population and its accuracy of classification, i.e. its ability to discriminate diseased subjects from non-diseased subjects. The latter is often considered an intrinsic property of the marker; it is independent of disease prevalence and hence more likely to be similar across populations than risk prediction measures. In this paper, we are interested in evaluating the population-specific performance of a risk prediction marker in terms of the positive predictive value PPV and negative predictive value NPV at given thresholds, when samples are available from the target population as well as from another population. A default strategy is to estimate PPV and NPV using samples from the target population only. However, when the marker’s accuracy of classification as characterized by a specific point on the receiver operating characteristics curve is similar across populations, borrowing information across populations allows increased efficiency in estimating PPV and NPV. We develop estimators that optimally combine information across populations. We apply this methodology to a cross- al study where we evaluate PCA3 as a risk prediction marker for prostate cancer among subjects with or without a previous negative biopsy.