Abstract
Given a prediction rule based on a set of patients, what is the probability of incorrectly predicting the outcome of a new patient? Call this probability the true error. An optimistic estimate is the apparent error, or the proportion of incorrect predictions on the original set of patients, and it is the goal of this article to study estimates of the excess error, or the difference between the true and apparent errors. I consider three estimates of the excess error: cross-validation, the jackknife, and the bootstrap. Using simulations and real data, the three estimates for a specific prediction rule are compared. When the prediction rule is allowed to be complicated, overfitting becomes a real danger, and excess error estimation becomes important. The prediction rule chosen here is moderately complicated, involving a variable-selection procedure based on forward logistic regression.

This publication has 0 references indexed in Scilit: