A local search approximation algorithm for k-means clustering
- 5 June 2002
- proceedings article
- Published by Association for Computing Machinery (ACM)
- Vol. 28, 10-18
- https://doi.org/10.1145/513400.513402
Abstract
In k-means clustering we are given a set of n data points in d-dimensional space Rd and an integer k, and the problem is to determine a set of k points in ÓC;d, called centers, to minimize the mean squared distance from each data point to its nearest center. No exact polynomial-time algorithms are known for this problem. Although asymptotically efficient approximation algorithms exist, these algorithms are not practical due to the extremely high constant factors involved. There are many heuristics that are used in practice, but we know of no bounds on their performance.We consider the question of whether there exists a simple and practical approximation algorithm for k-means clustering. We present a local improvement heuristic based on swapping centers in and out. We prove that this yields a (9+&egr;)-approximation algorithm. We show that the approximation factor is almost tight, by giving an example for which the algorithm achieves an approximation factor of (9-&egr;). To establish the practical value of the heuristic, we present an empirical study that shows that, when combined with Lloyd's algorithm, this heuristic performs quite well in practice.Keywords
This publication has 16 references indexed in Scilit:
- An efficient k-means clustering algorithm: analysis and implementationPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2002
- CLUSTERING USING SIMULATED ANNEALING WITH PROBABILISTIC REDISTRIBUTIONInternational Journal of Pattern Recognition and Artificial Intelligence, 2001
- On Approximate Geometric k -ClusteringDiscrete & Computational Geometry, 2000
- Statistical pattern recognition: a reviewPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2000
- Data clusteringACM Computing Surveys, 1999
- Centroidal Voronoi Tessellations: Applications and AlgorithmsSIAM Review, 1999
- Approximation schemes for Euclidean k-medians and related problemsPublished by Association for Computing Machinery (ACM) ,1998
- Geometric clusteringsJournal of Algorithms, 1991
- Using simulated annealing to design good codesIEEE Transactions on Information Theory, 1987
- Optimization by Simulated AnnealingScience, 1983