Automatic subspace clustering of high dimensional data for data mining applications

1 June 1998

proceedings article
Published by Association for Computing Machinery (ACM)

Vol. 27 (2) , 94-105
https://doi.org/10.1145/276304.276314

Abstract

Data mining applications place special requirements on clus-tering algorithms including: the ability to nd clusters em-bedded in subspaces of high dimensional data, scalability, end-user comprehensibility of the results, non-presumption of any canonical data distribution, and insensitivity to the order of input records. We present CLIQUE, a clustering al-gorithm that satis es each of these requirements. CLIQUE identi es dense clusters in subspaces of maximum dimen-sionality. It generates cluster descriptions in the form of DNF expressions that are minimized for ease of comprehen-sion. It produces identical results irrespective of the order in which input records are presented and does not presume any speci c mathematical form for data distribution. Through experiments, we show that CLIQUE e ciently nds accu-rate clusters in large high dimensional datasets.

Keywords

This publication has 21 references indexed in Scilit:

A comparative study of clustering methods
Future Generation Computer Systems, 1997
Mining quantitative association rules in large relational tables
Published by Association for Computing Machinery (ACM) ,1996
AN OVERVIEW OF COMBINATORIAL DATA ANALYSIS
Published by World Scientific Pub Co Pte Ltd ,1996
Hierarchical image segmentation by multi-dimensional clustering and orientation-adaptive boundary refinement
Pattern Recognition, 1995
An algorithm for point clustering and grid generation
IEEE Transactions on Systems, Man, and Cybernetics, 1991
Performance Guarantees on a Sweep-Line Heuristic for Covering Rectilinear Polygons with Rectangles
SIAM Journal on Discrete Mathematics, 1989
Covering a simple orthogonal polygon with a minimum number of orthogonally convex polygons
Published by Association for Computing Machinery (ACM) ,1987
A generalized histogram clustering scheme for multidimensional image data
Pattern Recognition, 1983
A Numerical Classification Method for Partitioning of a Large Multidimensional Mixed Data Set
Technometrics, 1979
On the ratio of optimal integral and fractional covers
Discrete Mathematics, 1975