Combining multiple clusterings using evidence accumulation
Top Cited Papers
- 5 July 2005
- journal article
- research article
- Published by Institute of Electrical and Electronics Engineers (IEEE) in IEEE Transactions on Pattern Analysis and Machine Intelligence
- Vol. 27 (6) , 835-850
- https://doi.org/10.1109/tpami.2005.113
Abstract
We explore the idea of evidence accumulation (EAC) for combining the results of multiple clusterings. First, a clustering ensemble - a set of object partitions, is produced. Given a data set (n objects or patterns in d dimensions), different ways of producing data partitions are: 1) applying different clustering algorithms and 2) applying the same clustering algorithm with different values of parameters or initializations. Further, combinations of different data representations (feature spaces) and clustering algorithms can also provide a multitude of significantly different data partitionings. We propose a simple framework for extracting a consistent clustering, given the various partitions in a clustering ensemble. According to the EAC concept, each partition is viewed as an independent evidence of data organization, individual data partitions being combined, based on a voting mechanism, to generate a new n × n similarity matrix between the n patterns. The final data partition of the n patterns is obtained by applying a hierarchical agglomerative clustering algorithm on this matrix. We have developed a theoretical framework for the analysis of the proposed clustering combination strategy and its evaluation, based on the concept of mutual information between data partitions. Stability of the results is evaluated using bootstrapping techniques. A detailed discussion of an evidence accumulation-based clustering algorithm, using a split and merge strategy based on the k-means clustering algorithm, is presented. Experimental results of the proposed method on several synthetic and real data sets are compared with other combination strategies, and with individual clustering results produced by well-known clustering algorithms.Keywords
This publication has 43 references indexed in Scilit:
- Normalized cuts and image segmentationPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2000
- Distribution Free Decomposition of Multivariate DataPattern Analysis and Applications, 1999
- The use of linked line segments for cluster representation and data reductionPattern Recognition Letters, 1999
- On-line hierarchical clusteringPattern Recognition Letters, 1998
- Bayesian approaches to Gaussian mixture modelingPublished by Institute of Electrical and Electronics Engineers (IEEE) ,1998
- On combining classifiersPublished by Institute of Electrical and Electronics Engineers (IEEE) ,1998
- Probabilistic validation approach for clusteringPattern Recognition Letters, 1995
- Bootstrap technique in cluster analysisPattern Recognition, 1987
- Cluster validity profilesPattern Recognition, 1982
- Clustering techniques: The user's dilemmaPattern Recognition, 1976