A bootstrap testing procedure for investigating the number of subpopulations

1 August 1985

journal article
research article
Published by Taylor & Francis in Journal of Statistical Computation and Simulation

Vol. 22 (2) , 99-112
https://doi.org/10.1080/00949658508810837

Abstract

Determining the number of subpopulations from sample data is a major problem in cluster analysis. We assume in this study that the subpopulations correspond to modes of the population density function f. We then propose using test statistics based on the kth nearest neighbor clustering method to investigate the modality of f. A modified bootstrap procedure for estimating the sample significance levels of these statistics in the univariate case is described. The performance of this procedure in determining the number of subpopulations will be illustrated by generated and real data sets.

Keywords

This publication has 14 references indexed in Scilit:

An Examination of Procedures for Determining the Number of Clusters in a Data Set
Psychometrika, 1985
Density Estimation and Bump-Hunting by the Penalized Likelihood Method Exemplified by Scattering and Meteorite Data
Journal of the American Statistical Association, 1980
Multivariate Tests for Clusters
Journal of the American Statistical Association, 1979
Bootstrap Methods: Another Look at the Jackknife
The Annals of Statistics, 1979
On the optimal number of features in the classification of multivariate Gaussian data
Pattern Recognition, 1978
Asymptotic Distributions for Clustering Criteria
The Annals of Statistics, 1978
An Algorithm for Finding Nearest Neighbors
IEEE Transactions on Computers, 1975
SUBPOPULATIONS OF BLOOD LYMPHOCYTES DEMONSTRATED BY QUANTITATIVE CYTOCHEMISTRY
Journal of Histochemistry & Cytochemistry, 1971
Percentage Points of a Test for Clusters
Journal of the American Statistical Association, 1969
THE USE OF MULTIPLE MEASUREMENTS IN TAXONOMIC PROBLEMS
Annals of Eugenics, 1936