Boosting with early stopping: Convergence and consistency

Open Access

1 August 2005

journal article
Published by Institute of Mathematical Statistics in The Annals of Statistics

Vol. 33 (4) , 1538-1579
https://doi.org/10.1214/009053605000000255

Abstract

Boosting is one of the most significant advances in machine learning for classification and regression. In its original and computationally flexible version, boosting seeks to minimize empirically a loss function in a greedy fashion. The resulting estimator takes an additive function form and is built iteratively by applying a base estimator (or learner) to updated samples depending on the previous iterations. An unusual regularization technique, early stopping, is employed based on CV or a test set. This paper studies numerical convergence, consistency and statistical rates of convergence of boosting with early stopping, when it is carried out over the linear span of a family of basis functions. For general loss functions, we prove the convergence of boosting’s greedy optimization to the infinimum of the loss function over the linear span. Using the numerical convergence result, we find early-stopping strategies under which boosting is shown to be consistent based on i.i.d. samples, and we obtain bounds on the rates of convergence for boosting estimators. Simulation studies are also presented to illustrate the relevance of our theoretical results for providing insights to practical aspects of boosting. As a side product, these results also reveal the importance of restricting the greedy search step-sizes, as known in practice through the work of Friedman and others. Moreover, our results lead to a rigorous proof that for a linearly separable problem, AdaBoost with ɛ→0 step-size becomes an L¹-margin maximizer when left to run to convergence.

Keywords

All Related Versions

Version 1, 2005-08-16, ArXiv

This publication has 30 references indexed in Scilit:

Convexity, Classification, and Risk Bounds
Journal of the American Statistical Association, 2006
Complexities of convex combinations and bounding the generalization error in classification
The Annals of Statistics, 2005
Complexity regularization via localized random penalties
The Annals of Statistics, 2004
Process consistency for AdaBoost
The Annals of Statistics, 2004
Sequential greedy approximation for certain convex optimization problems
IEEE Transactions on Information Theory, 2003
Empirical Margin Distributions and Bounding the Generalization Error of Combined Classifiers
The Annals of Statistics, 2002
A Decision-Theoretic Generalization of On-Line Learning and an Application to Boosting
Journal of Computer and System Sciences, 1997
Asymptotic distribution of the errors in scalar and vector quantizers
IEEE Transactions on Information Theory, 1996
Matching pursuits with time-frequency dictionaries
IEEE Transactions on Signal Processing, 1993
A Simple Lemma on Greedy Approximation in Hilbert Space and Convergence Rates for Projection Pursuit Regression and Neural Network Training
The Annals of Statistics, 1992