Abstract
The SAS statistical package contains a general purpose discriminant procedure. Discrim. Among the options available for this procedure are ones for performing nearest neighbour discriminant analysis and cross-validation. Each of these works well enough when used separately but, when the two options are used together, an optimistic bias in cross-validated performance emerges. For certain parameter values, this bias can be dramatically large. The cause of the problem is analyzed mathematically for the two-class case with uniformly distributed data and demonstrated by simulation for normal data. The corresponding misbehaviour for multiple classes is also demonstrated by Monte Carlo simulation. A modification to the procedure, which would remove the bias, is proposed.

This publication has 5 references indexed in Scilit: