Statistical behavior and consistency of classification methods based on convex risk minimization

Top Cited Papers

Open Access

1 February 2004

journal article
Published by Institute of Mathematical Statistics in The Annals of Statistics

Vol. 32 (1) , 56-134
https://doi.org/10.1214/aos/1079120130

Abstract

We study how closely the optimal Bayes error rate can be approximately reached using a classification algorithm that computes a classifier by mini- mizing a convex upper bound of the classification error function. The mea- surement of closeness is characterized by the loss function used in the estima- tion. We show that such a classification scheme can be generally regarded as a (nonmaximum-likelihood) conditional in-class probability estimate, and we use this analysis to compare various convex loss functions that have appeared in the literature. Furthermore, the theoretical insight allows us to design good loss functions with desirable properties. Another aspect of our analysis is to demonstrate the consistency of certain classification methods using convex risk minimization. This study sheds light on the good performance of some recently proposed linear classification methods including boosting and sup- port vector machines. It also shows their limitations and suggests possible improvements. 1. Motivation. In statistical machine learning, the goal is often to predict an unobserved output value y based on an observed input vector x. This requires us to estimate a functional relationship y ≈ f( x)from a set of example pairs of (x, y). Usually the quality of the predictor f( x)can be measured by a problem dependent loss function �(f (x), y) . In machine learning analysis, one assumes that the training data are drawn from an underlying distribution D which is not known. Our goal is to find a predictor f( x)so that the expected loss of f given below is as small as possible: L(f (·)) = EX,Y � � f( X), Y � ,

Keywords

This publication has 15 references indexed in Scilit:

On the Bayes-risk consistency of regularized boosting methods
The Annals of Statistics, 2004
Support Vector Machines are Universally Consistent
Journal of Complexity, 2002
The Consistency of Greedy Algorithms for Classification
Published by Springer Nature ,2002
A Leave-One-out Cross Validation Bound for Kernel Methods with Applications in Learning
Published by Springer Nature ,2001
Additive logistic regression: a statistical view of boosting (With discussion and a rejoinder by the authors)
The Annals of Statistics, 2000
Boosting the margin: a new explanation for the effectiveness of voting methods
The Annals of Statistics, 1998
Arcing classifier (with discussion and a rejoinder by the author)
The Annals of Statistics, 1998
A Decision-Theoretic Generalization of On-Line Learning and an Application to Boosting
Journal of Computer and System Sciences, 1997
Multilayer feedforward networks with a nonpolynomial activation function can approximate any function
Neural Networks, 1993
The relaxation method of finding the common point of convex sets and its application to the solution of problems in convex programming
USSR Computational Mathematics and Mathematical Physics, 1967