Cross validation of nearest neighbour discriminant analysis—a warning to sas users
- 1 January 1994
- journal article
- research article
- Published by Taylor & Francis in Journal of Statistical Computation and Simulation
- Vol. 49 (3-4) , 129-140
- https://doi.org/10.1080/00949659408811566
Abstract
The SAS statistical package contains a general purpose discriminant procedure. Discrim. Among the options available for this procedure are ones for performing nearest neighbour discriminant analysis and cross-validation. Each of these works well enough when used separately but, when the two options are used together, an optimistic bias in cross-validated performance emerges. For certain parameter values, this bias can be dramatically large. The cause of the problem is analyzed mathematically for the two-class case with uniformly distributed data and demonstrated by simulation for normal data. The corresponding misbehaviour for multiple classes is also demonstrated by Monte Carlo simulation. A modification to the procedure, which would remove the bias, is proposed.Keywords
This publication has 5 references indexed in Scilit:
- The jack-knife with a stepwise discriminant algorithm-a warning to BMDP usesrJournal of Applied Statistics, 1993
- Monte Carlo randomization tests: A reply to BradburyBritish Journal of Mathematical and Statistical Psychology, 1987
- The approximate randomization test as an alternative to the F test in analysis of varianceBritish Journal of Mathematical and Statistical Psychology, 1981
- Nearest neighbor pattern classificationIEEE Transactions on Information Theory, 1967
- EXTENSION OF THE NEYMAN-PEARSON THEORY OF TESTS TO DISCONTINUOUS VARIATESBiometrika, 1950