Cross validation of nearest neighbour discriminant analysis—a warning to sas users

1 January 1994

journal article
research article
Published by Taylor & Francis in Journal of Statistical Computation and Simulation

Vol. 49 (3-4) , 129-140
https://doi.org/10.1080/00949659408811566

Abstract

The SAS statistical package contains a general purpose discriminant procedure. Discrim. Among the options available for this procedure are ones for performing nearest neighbour discriminant analysis and cross-validation. Each of these works well enough when used separately but, when the two options are used together, an optimistic bias in cross-validated performance emerges. For certain parameter values, this bias can be dramatically large. The cause of the problem is analyzed mathematically for the two-class case with uniformly distributed data and demonstrated by simulation for normal data. The corresponding misbehaviour for multiple classes is also demonstrated by Monte Carlo simulation. A modification to the procedure, which would remove the bias, is proposed.

Keywords

This publication has 5 references indexed in Scilit:

The jack-knife with a stepwise discriminant algorithm-a warning to BMDP usesr
Journal of Applied Statistics, 1993
Monte Carlo randomization tests: A reply to Bradbury
British Journal of Mathematical and Statistical Psychology, 1987
The approximate randomization test as an alternative to the F test in analysis of variance
British Journal of Mathematical and Statistical Psychology, 1981
Nearest neighbor pattern classification
IEEE Transactions on Information Theory, 1967
EXTENSION OF THE NEYMAN-PEARSON THEORY OF TESTS TO DISCONTINUOUS VARIATES
Biometrika, 1950