Clustering data streams: theory and practice
Top Cited Papers
- 13 May 2003
- journal article
- Published by Institute of Electrical and Electronics Engineers (IEEE) in IEEE Transactions on Knowledge and Data Engineering
- Vol. 15 (3) , 515-528
- https://doi.org/10.1109/tkde.2003.1198387
Abstract
The data stream model has recently attracted attention for its applicability to numerous types of data, including telephone records, Web documents, and clickstreams. For analysis of such data, the ability to process the data in a single pass, or a small number of passes, while using little memory, is crucial. We describe such a streaming algorithm that effectively clusters large data streams. We also provide empirical evidence of the algorithm's performance on synthetic and real data streams.Keywords
This publication has 53 references indexed in Scilit:
- Probabilistic counting algorithms for data base applicationsPublished by Elsevier ,2003
- CLARANS: a method for clustering objects for spatial data miningIEEE Transactions on Knowledge and Data Engineering, 2002
- Cure: an efficient clustering algorithm for large databasesInformation Systems, 2001
- DEMON: mining and monitoring evolving dataIEEE Transactions on Knowledge and Data Engineering, 2001
- Scalability for clustering algorithms revisitedACM SIGKDD Explorations Newsletter, 2000
- An approach to active spatial data mining based on statistical informationIEEE Transactions on Knowledge and Data Engineering, 2000
- Greedy Strikes Back: Improved Facility Location AlgorithmsJournal of Algorithms, 1999
- Randomized Query Processing in Robot Path PlanningJournal of Computer and System Sciences, 1998
- Approximation algorithms for geometric median problemsInformation Processing Letters, 1992
- Selection and sorting with limited storageTheoretical Computer Science, 1980