Clustering aggregation
Top Cited Papers
- 1 March 2007
- journal article
- Published by Association for Computing Machinery (ACM) in ACM Transactions on Knowledge Discovery From Data
- Vol. 1 (1)
- https://doi.org/10.1145/1217299.1217303
Abstract
We consider the following problem: given a set of clusterings, find a single clustering that agrees as much as possible with the input clusterings. This problem, clustering aggregation , appears naturally in various contexts. For example, clustering categorical data is an instance of the clustering aggregation problem; each categorical attribute can be viewed as a clustering of the input rows where rows are grouped together if they take the same value on that attribute. Clustering aggregation can also be used as a metaclustering method to improve the robustness of clustering by combining the output of multiple algorithms. Furthermore, the problem formulation does not require a priori information about the number of clusters; it is naturally determined by the optimization function. In this article, we give a formal statement of the clustering aggregation problem, and we propose a number of algorithms. Our algorithms make use of the connection between clustering aggregation and the problem of correlation clustering . Although the problems we consider are NP-hard, for several of our methods, we provide theoretical guarantees on the quality of the solutions. Our work provides the best deterministic approximation algorithm for the variation of the correlation clustering problem we consider. We also show how sampling can be used to scale the algorithms for large datasets. We give an extensive empirical evaluation demonstrating the usefulness of the problem and of the solutions.Keywords
This publication has 13 references indexed in Scilit:
- Correlation clustering in general weighted graphsTheoretical Computer Science, 2006
- Aggregating time partitionsPublished by Association for Computing Machinery (ACM) ,2006
- Aggregating inconsistent informationPublished by Association for Computing Machinery (ACM) ,2005
- INTEGRATING MICROARRAY DATA BY CONSENSUS CLUSTERINGInternational Journal on Artificial Intelligence Tools, 2004
- Correlation ClusteringMachine Learning, 2004
- Rank aggregation methods for the WebPublished by Association for Computing Machinery (ACM) ,2001
- Rock: A robust clustering algorithm for categorical attributesInformation Systems, 2000
- Geometry of Cuts and MetricsPublished by Springer Nature ,1997
- The median procedure for partitionsPublished by American Mathematical Society (AMS) ,1995
- A Best Possible Heuristic for the k-Center ProblemMathematics of Operations Research, 1985