Combining partitions by probabilistic label aggregation
- 21 August 2005
- proceedings article
- Published by Association for Computing Machinery (ACM)
- p. 147-156
- https://doi.org/10.1145/1081870.1081890
Abstract
Data clustering represents an important tool in exploratory data analysis. The lack of objective criteria render model selection as well as the identification of robust solutions particularly difficult. The use of a stability assessment and the combination of multiple clustering solutions represents an important ingredient to achieve the goal of finding useful partitions. In this work, we propose a novel way of combining multiple clustering solutions for both, hard and soft partitions: the approach is based on modeling the probability that two objects are grouped together. An efficient EM optimization strategy is employed in order to estimate the model parameters. Our proposal can also be extended in order to emphasize the signal more strongly by weighting individual base clustering solutions according to their consistency with the prediction for previously unseen objects. In addition to that, the probabilistic model supports an out-of-sample extension that (i) makes it possible to assign previously unseen objects to classes of the combined solution and (ii) renders the efficient aggregation of solutions possible. In this work, we also shed some light on the usefulness of such combination approaches. In the experimental result section, we demonstrate the competitive performance of our proposal in comparison with other recently proposed methods for combining multiple classifications of a finite data set.Keywords
This publication has 14 references indexed in Scilit:
- Kernel k-meansPublished by Association for Computing Machinery (ACM) ,2004
- A probabilistic framework for semi-supervised clusteringPublished by Association for Computing Machinery (ACM) ,2004
- A Mixture Model for Clustering EnsemblesPublished by Society for Industrial & Applied Mathematics (SIAM) ,2004
- Optimal cluster preserving embedding of nonmetric proximity dataIEEE Transactions on Pattern Analysis and Machine Intelligence, 2003
- Bagging for path-based clusteringPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2003
- Bagging to improve the accuracy of a clustering procedureBioinformatics, 2003
- The Elements of Statistical LearningPublished by Springer Nature ,2001
- Normalized cuts and image segmentationPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2000
- 10.1162/153244303321897735Applied Physics Letters, 2000
- Information Theory and Statistical Mechanics. IIPhysical Review B, 1957