Boosting as entropy projection
- 6 July 1999
- conference paper
- Published by Association for Computing Machinery (ACM)
- p. 134-144
- https://doi.org/10.1145/307400.307424
Abstract
We consider the AdaBoost procedure for boosting weak learners. In AdaBoost, a key step is choosing a new distribution on the training examples based on the old distribution and the mistakes made by the present weak hypothesis. We show how Ada- Boost's choice of the new distribution can be seen as an approximate solution to the following prob- lem: Find a new distribution that is closest to the old distribution subject to the constraint that the new distribution is orthogonal to the vector of mis- takes of the current weak hypothesis. The distance (or divergence) between distributions is measured by the relative entropy. Alternatively, we could say that AdaBoost approximately projects the distribu- tion vector onto a hyperplane defined by the mis- take vector. We show that this new view of Ada- Boost as an entropy projection is dual to the usual view of AdaBoost as minimizing the normaliza- tion factors of the updated distributions.Keywords
This publication has 17 references indexed in Scilit:
- A Decision-Theoretic Generalization of On-Line Learning and an Application to BoostingJournal of Computer and System Sciences, 1997
- Inducing features of random fieldsPublished by Institute of Electrical and Electronics Engineers (IEEE) ,1997
- Exponentiated Gradient versus Gradient Descent for Linear PredictorsInformation and Computation, 1997
- Boosting a Weak Learning Algorithm by MajorityInformation and Computation, 1995
- Bounds on approximate steepest descent for likelihood maximization in exponential familiesIEEE Transactions on Information Theory, 1994
- Entropy Optimization Principles and Their ApplicationsPublished by Springer Nature ,1992
- Why Least Squares and Maximum Entropy? An Axiomatic Approach to Inference for Linear Inverse ProblemsThe Annals of Statistics, 1991
- General entropy criteria for inverse problems, with applications to data compression, pattern classification, and cluster analysisIEEE Transactions on Information Theory, 1990
- An iterative row-action method for interval convex programmingJournal of Optimization Theory and Applications, 1981
- The relaxation method of finding the common point of convex sets and its application to the solution of problems in convex programmingUSSR Computational Mathematics and Mathematical Physics, 1967