Boosting as entropy projection

6 July 1999

conference paper
Published by Association for Computing Machinery (ACM)

p. 134-144
https://doi.org/10.1145/307400.307424

Abstract

We consider the AdaBoost procedure for boosting weak learners. In AdaBoost, a key step is choosing a new distribution on the training examples based on the old distribution and the mistakes made by the present weak hypothesis. We show how Ada- Boost's choice of the new distribution can be seen as an approximate solution to the following prob- lem: Find a new distribution that is closest to the old distribution subject to the constraint that the new distribution is orthogonal to the vector of mis- takes of the current weak hypothesis. The distance (or divergence) between distributions is measured by the relative entropy. Alternatively, we could say that AdaBoost approximately projects the distribu- tion vector onto a hyperplane defined by the mis- take vector. We show that this new view of Ada- Boost as an entropy projection is dual to the usual view of AdaBoost as minimizing the normaliza- tion factors of the updated distributions.

Keywords

This publication has 17 references indexed in Scilit:

A Decision-Theoretic Generalization of On-Line Learning and an Application to Boosting
Journal of Computer and System Sciences, 1997
Inducing features of random fields
Published by Institute of Electrical and Electronics Engineers (IEEE) ,1997
Exponentiated Gradient versus Gradient Descent for Linear Predictors
Information and Computation, 1997
Boosting a Weak Learning Algorithm by Majority
Information and Computation, 1995
Bounds on approximate steepest descent for likelihood maximization in exponential families
IEEE Transactions on Information Theory, 1994
Entropy Optimization Principles and Their Applications
Published by Springer Nature ,1992
Why Least Squares and Maximum Entropy? An Axiomatic Approach to Inference for Linear Inverse Problems
The Annals of Statistics, 1991
General entropy criteria for inverse problems, with applications to data compression, pattern classification, and cluster analysis
IEEE Transactions on Information Theory, 1990
An iterative row-action method for interval convex programming
Journal of Optimization Theory and Applications, 1981
The relaxation method of finding the common point of convex sets and its application to the solution of problems in convex programming
USSR Computational Mathematics and Mathematical Physics, 1967