An Automatic Clustering Algorithm and Its Properties in High-Dimensional Spaces
- 1 April 1972
- journal article
- Published by Institute of Electrical and Electronics Engineers (IEEE) in IEEE Transactions on Systems, Man, and Cybernetics
- Vol. SMC-2 (2) , 247-254
- https://doi.org/10.1109/tsmc.1972.4309100
Abstract
An economical technique for approximating a joint N-dimensional probability density function has been described by Sebestyen and Edie [20]. The algorithm searches for clusters of points and considers each cluster as one hyperellipsoidal cell in an N-dimensional histogram. Among the advantages of this scheme are: 1) the histogram cell descriptors-location, shape, and size-can be determined adaptively from sequentially introduced data samples of known classification and, 2) the number of cells required for a good fit can usually be held to a small number. No assumptions are required about the underlying statistical structure of the data. The algorithm requires three types of "control parameters" which critically affect its performance and are dependent upon the number of dimensions. The three factors control the birth, shape, and growth rate of the cells. Guides were presented in [20] for choosing the control parameter values. These guides functioned well for spaces of 3 dimensions or less, but did not yield usable values for spaces of greater dimensionality. This paper presents heuristics which were developed to automate the selection of the control parameters. The properties of these parameters were studied as a function of dimension. Two of the control parameters were found to be linearly related to dimension. This provides a method for determining their value by extrapolation, thereby avoiding a great deal of computation.Keywords
This publication has 8 references indexed in Scilit:
- Classification of benign and malignant breast tumors on the basis of 36 radiographic propertiesCancer, 1973
- A Comparison of Seven Techniques for Choosing Subsets of Pattern Recognition PropertiesIEEE Transactions on Computers, 1971
- INTRODUCTION TO BIOLOGICAL AND MECHANICAL PATTERN RECOGNITIONPublished by Elsevier ,1969
- The Construction of Hierarchic and Non-Hierarchic ClassificationsThe Computer Journal, 1968
- Current approaches to classification and clump-finding at the Cambridge Language Research UnitThe Computer Journal, 1967
- A general theory of classificatory sorting strategies: II. Clustering systemsThe Computer Journal, 1967
- A class of nonlinear recognition proceduresIEEE Transactions on Systems Science and Cybernetics, 1966
- An Algorithm for Non-Parametric Pattern RecognitionIEEE Transactions on Electronic Computers, 1966