Automatic subspace clustering of high dimensional data for data mining applications
- 1 June 1998
- journal article
- conference paper
- Published by Association for Computing Machinery (ACM) in ACM SIGMOD Record
- Vol. 27 (2) , 94-105
- https://doi.org/10.1145/276305.276314
Abstract
Data mining applications place special requirements on clustering algorithms including: the ability to find clusters embedded in subspaces of high dimensional data, scalability, end-user comprehensibility of the results, non-presumption of any canonical data distribution, and insensitivity to the order of input records. We present CLIQUE, a clustering algorithm that satisfies each of these requirements. CLIQUE identifies dense clusters in subspaces of maximum dimensionality. It generates cluster descriptions in the form of DNF expressions that are minimized for ease of comprehension. It produces identical results irrespective of the order in which input records are presented and does not presume any specific mathematical form for data distribution. Through experiments, we show that CLIQUE efficiently finds accurate cluster in large high dimensional datasets.Keywords
This publication has 20 references indexed in Scilit:
- A comparative study of clustering methodsFuture Generation Computer Systems, 1997
- Mining quantitative association rules in large relational tablesPublished by Association for Computing Machinery (ACM) ,1996
- AN OVERVIEW OF COMBINATORIAL DATA ANALYSISPublished by World Scientific Pub Co Pte Ltd ,1996
- Hierarchical image segmentation by multi-dimensional clustering and orientation-adaptive boundary refinementPattern Recognition, 1995
- An algorithm for point clustering and grid generationIEEE Transactions on Systems, Man, and Cybernetics, 1991
- Performance Guarantees on a Sweep-Line Heuristic for Covering Rectilinear Polygons with RectanglesSIAM Journal on Discrete Mathematics, 1989
- Covering a simple orthogonal polygon with a minimum number of orthogonally convex polygonsPublished by Association for Computing Machinery (ACM) ,1987
- A generalized histogram clustering scheme for multidimensional image dataPattern Recognition, 1983
- A Numerical Classification Method for Partitioning of a Large Multidimensional Mixed Data SetTechnometrics, 1979
- On the ratio of optimal integral and fractional coversDiscrete Mathematics, 1975