A streaming ensemble algorithm (SEA) for large-scale classification

Top Cited Papers

26 August 2001

proceedings article
Published by Association for Computing Machinery (ACM)

p. 377-382
https://doi.org/10.1145/502512.502568

Abstract

Ensemble methods have recently garnered a great deal of attention in the machine learning community. Techniques such as Boosting and Bagging have proven to be highly effective but require repeated resampling of the training data, making them inappropriate in a data mining context. The methods presented in this paper take advantage of plentiful data, building separate classifiers on sequential chunks of training points. These classifiers are combined into a fixed-size ensemble using a heuristic replacement strategy. The result is a fast algorithm for large-scale or streaming data that classifies as well as a single decision tree built on all the data, requires approximately constant memory, and adjusts quickly to concept drift.

Keywords

This publication has 12 references indexed in Scilit:

Mining high-speed data streams
Published by Association for Computing Machinery (ACM) ,2000
Data selection for support vector machine classifiers
Published by Association for Computing Machinery (ACM) ,2000
BOAT—optimistic decision tree construction
Published by Association for Computing Machinery (ACM) ,1999
An Empirical Comparison of Voting Classification Algorithms: Bagging, Boosting, and Variants
Machine Learning, 1999
Arcing classifier (with discussion and a rejoinder by the author)
The Annals of Statistics, 1998
Bagging Predictors
Machine Learning, 1996
Learning in the Presence of Concept Drift and Hidden Contexts
Machine Learning, 1996
Methods For Combining Experts' Probability Assessments
Neural Computation, 1995
The Strength of Weak Learnability
Machine Learning, 1990
Relation of tumor size, lymph node status, and survival in 24,740 breast cancer cases
Cancer, 1989