Streaming-data algorithms for high-quality clustering

Top Cited Papers

25 June 2003

conference paper
Published by Institute of Electrical and Electronics Engineers (IEEE)

Abstract

Streaming data analysis has recently attracted attention in numerous applications including telephone records, web documents and clickstreams. For such analysis, single-pass algorithms that consume a small amount of memory are critical. We describe such a streaming algorithm that effectively clusters large data streams. We also provide empirical evidence of the algorithm's performance on synthetic and real data streams.

Keywords

This publication has 11 references indexed in Scilit:

Improved combinatorial algorithms for the facility location and k-median problems
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2003
Primal-dual approximation algorithms for metric facility location and k-median problems
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2003
Clustering data streams
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2002
Cure: an efficient clustering algorithm for large databases
Information Systems, 2001
Online facility location
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2001
OPTICS
Published by Association for Computing Machinery (ACM) ,1999
Approximate medians and other quantiles in one pass and with limited memory
Published by Association for Computing Machinery (ACM) ,1998
Automatic subspace clustering of high dimensional data for data mining applications
Published by Association for Computing Machinery (ACM) ,1998
BIRCH
ACM SIGMOD Record, 1996
K-Means-Type Algorithms: A Generalized Convergence Theorem and Characterization of Local Optimality
Published by Institute of Electrical and Electronics Engineers (IEEE) ,1984