A Nearest-Centroid Technique For Evaluating The Minimum-Variance Clustering Procedure

Abstract
It was posited that a good cluster solution has two characteristics: (1) it is stable across multiple random samples; and (2) its clusters accurately correspond to the populations from which the sample of data comes. A technique for evaluating a minimum variance cluster solution (Ward, 1963) was presented which deals with these two characteristics of a solution. In this technique, which follows the cross-validation paradigm, two data sets were drawn from a population mixture. One data set was cluster analyzed (by the minimum variance procedure), and the centroid vectors for the solution were calculated. After the other data set was cluster analyzed, its objects were assigned to the nearest centroid calculated in the first data set. The outcome is a kappa statistic which measures the agreement between the minimum variance cluster solution of the second data set and the classifications made by the nearest-centroid assignment rule. Results from Monte Carlo investigations of the technique showed that the new ...

This publication has 16 references indexed in Scilit: