A streaming ensemble algorithm (SEA) for large-scale classification
Top Cited Papers
- 26 August 2001
- proceedings article
- Published by Association for Computing Machinery (ACM)
- p. 377-382
- https://doi.org/10.1145/502512.502568
Abstract
Ensemble methods have recently garnered a great deal of attention in the machine learning community. Techniques such as Boosting and Bagging have proven to be highly effective but require repeated resampling of the training data, making them inappropriate in a data mining context. The methods presented in this paper take advantage of plentiful data, building separate classifiers on sequential chunks of training points. These classifiers are combined into a fixed-size ensemble using a heuristic replacement strategy. The result is a fast algorithm for large-scale or streaming data that classifies as well as a single decision tree built on all the data, requires approximately constant memory, and adjusts quickly to concept drift.Keywords
This publication has 12 references indexed in Scilit:
- Mining high-speed data streamsPublished by Association for Computing Machinery (ACM) ,2000
- Data selection for support vector machine classifiersPublished by Association for Computing Machinery (ACM) ,2000
- BOAT—optimistic decision tree constructionPublished by Association for Computing Machinery (ACM) ,1999
- An Empirical Comparison of Voting Classification Algorithms: Bagging, Boosting, and VariantsMachine Learning, 1999
- Arcing classifier (with discussion and a rejoinder by the author)The Annals of Statistics, 1998
- Bagging PredictorsMachine Learning, 1996
- Learning in the Presence of Concept Drift and Hidden ContextsMachine Learning, 1996
- Methods For Combining Experts' Probability AssessmentsNeural Computation, 1995
- The Strength of Weak LearnabilityMachine Learning, 1990
- Relation of tumor size, lymph node status, and survival in 24,740 breast cancer casesCancer, 1989