Abstract
Determining the number of subpopulations from sample data is a major problem in cluster analysis. We assume in this study that the subpopulations correspond to modes of the population density function f. We then propose using test statistics based on the kth nearest neighbor clustering method to investigate the modality of f. A modified bootstrap procedure for estimating the sample significance levels of these statistics in the univariate case is described. The performance of this procedure in determining the number of subpopulations will be illustrated by generated and real data sets.

This publication has 14 references indexed in Scilit: