Streaming-data algorithms for high-quality clustering
Top Cited Papers
- 25 June 2003
- conference paper
- Published by Institute of Electrical and Electronics Engineers (IEEE)
Abstract
Streaming data analysis has recently attracted attention in numerous applications including telephone records, web documents and clickstreams. For such analysis, single-pass algorithms that consume a small amount of memory are critical. We describe such a streaming algorithm that effectively clusters large data streams. We also provide empirical evidence of the algorithm's performance on synthetic and real data streams.Keywords
This publication has 11 references indexed in Scilit:
- Improved combinatorial algorithms for the facility location and k-median problemsPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2003
- Primal-dual approximation algorithms for metric facility location and k-median problemsPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2003
- Clustering data streamsPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2002
- Cure: an efficient clustering algorithm for large databasesInformation Systems, 2001
- Online facility locationPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2001
- OPTICSPublished by Association for Computing Machinery (ACM) ,1999
- Approximate medians and other quantiles in one pass and with limited memoryPublished by Association for Computing Machinery (ACM) ,1998
- Automatic subspace clustering of high dimensional data for data mining applicationsPublished by Association for Computing Machinery (ACM) ,1998
- BIRCHACM SIGMOD Record, 1996
- K-Means-Type Algorithms: A Generalized Convergence Theorem and Characterization of Local OptimalityPublished by Institute of Electrical and Electronics Engineers (IEEE) ,1984