Statistical behavior and consistency of classification methods based on convex risk minimization
Top Cited Papers
Open Access
- 1 February 2004
- journal article
- Published by Institute of Mathematical Statistics in The Annals of Statistics
- Vol. 32 (1) , 56-134
- https://doi.org/10.1214/aos/1079120130
Abstract
We study how closely the optimal Bayes error rate can be approximately reached using a classification algorithm that computes a classifier by mini- mizing a convex upper bound of the classification error function. The mea- surement of closeness is characterized by the loss function used in the estima- tion. We show that such a classification scheme can be generally regarded as a (nonmaximum-likelihood) conditional in-class probability estimate, and we use this analysis to compare various convex loss functions that have appeared in the literature. Furthermore, the theoretical insight allows us to design good loss functions with desirable properties. Another aspect of our analysis is to demonstrate the consistency of certain classification methods using convex risk minimization. This study sheds light on the good performance of some recently proposed linear classification methods including boosting and sup- port vector machines. It also shows their limitations and suggests possible improvements. 1. Motivation. In statistical machine learning, the goal is often to predict an unobserved output value y based on an observed input vector x. This requires us to estimate a functional relationship y ≈ f( x)from a set of example pairs of (x, y). Usually the quality of the predictor f( x)can be measured by a problem dependent loss function �(f (x), y) . In machine learning analysis, one assumes that the training data are drawn from an underlying distribution D which is not known. Our goal is to find a predictor f( x)so that the expected loss of f given below is as small as possible: L(f (·)) = EX,Y � � f( X), Y � ,Keywords
This publication has 15 references indexed in Scilit:
- On the Bayes-risk consistency of regularized boosting methodsThe Annals of Statistics, 2004
- Support Vector Machines are Universally ConsistentJournal of Complexity, 2002
- The Consistency of Greedy Algorithms for ClassificationPublished by Springer Nature ,2002
- A Leave-One-out Cross Validation Bound for Kernel Methods with Applications in LearningPublished by Springer Nature ,2001
- Additive logistic regression: a statistical view of boosting (With discussion and a rejoinder by the authors)The Annals of Statistics, 2000
- Boosting the margin: a new explanation for the effectiveness of voting methodsThe Annals of Statistics, 1998
- Arcing classifier (with discussion and a rejoinder by the author)The Annals of Statistics, 1998
- A Decision-Theoretic Generalization of On-Line Learning and an Application to BoostingJournal of Computer and System Sciences, 1997
- Multilayer feedforward networks with a nonpolynomial activation function can approximate any functionNeural Networks, 1993
- The relaxation method of finding the common point of convex sets and its application to the solution of problems in convex programmingUSSR Computational Mathematics and Mathematical Physics, 1967