Automatic subspace clustering of high dimensional data for data mining applications
- 1 June 1998
- proceedings article
- Published by Association for Computing Machinery (ACM)
- Vol. 27 (2) , 94-105
- https://doi.org/10.1145/276304.276314
Abstract
Data mining applications place special requirements on clus-tering algorithms including: the ability to nd clusters em-bedded in subspaces of high dimensional data, scalability, end-user comprehensibility of the results, non-presumption of any canonical data distribution, and insensitivity to the order of input records. We present CLIQUE, a clustering al-gorithm that satis es each of these requirements. CLIQUE identi es dense clusters in subspaces of maximum dimen-sionality. It generates cluster descriptions in the form of DNF expressions that are minimized for ease of comprehen-sion. It produces identical results irrespective of the order in which input records are presented and does not presume any speci c mathematical form for data distribution. Through experiments, we show that CLIQUE e ciently nds accu-rate clusters in large high dimensional datasets.Keywords
This publication has 21 references indexed in Scilit:
- A comparative study of clustering methodsFuture Generation Computer Systems, 1997
- Mining quantitative association rules in large relational tablesPublished by Association for Computing Machinery (ACM) ,1996
- AN OVERVIEW OF COMBINATORIAL DATA ANALYSISPublished by World Scientific Pub Co Pte Ltd ,1996
- Hierarchical image segmentation by multi-dimensional clustering and orientation-adaptive boundary refinementPattern Recognition, 1995
- An algorithm for point clustering and grid generationIEEE Transactions on Systems, Man, and Cybernetics, 1991
- Performance Guarantees on a Sweep-Line Heuristic for Covering Rectilinear Polygons with RectanglesSIAM Journal on Discrete Mathematics, 1989
- Covering a simple orthogonal polygon with a minimum number of orthogonally convex polygonsPublished by Association for Computing Machinery (ACM) ,1987
- A generalized histogram clustering scheme for multidimensional image dataPattern Recognition, 1983
- A Numerical Classification Method for Partitioning of a Large Multidimensional Mixed Data SetTechnometrics, 1979
- On the ratio of optimal integral and fractional coversDiscrete Mathematics, 1975