Automatic subspace clustering of high dimensional data for data mining applications

Abstract
Data mining applications place special requirements on clus-tering algorithms including: the ability to nd clusters em-bedded in subspaces of high dimensional data, scalability, end-user comprehensibility of the results, non-presumption of any canonical data distribution, and insensitivity to the order of input records. We present CLIQUE, a clustering al-gorithm that satis es each of these requirements. CLIQUE identi es dense clusters in subspaces of maximum dimen-sionality. It generates cluster descriptions in the form of DNF expressions that are minimized for ease of comprehen-sion. It produces identical results irrespective of the order in which input records are presented and does not presume any speci c mathematical form for data distribution. Through experiments, we show that CLIQUE e ciently nds accu-rate clusters in large high dimensional datasets.

This publication has 21 references indexed in Scilit: