Top Cited Papers
Open Access
Abstract
We study how closely the optimal Bayes error rate can be approximately reached using a classification algorithm that computes a classifier by mini- mizing a convex upper bound of the classification error function. The mea- surement of closeness is characterized by the loss function used in the estima- tion. We show that such a classification scheme can be generally regarded as a (nonmaximum-likelihood) conditional in-class probability estimate, and we use this analysis to compare various convex loss functions that have appeared in the literature. Furthermore, the theoretical insight allows us to design good loss functions with desirable properties. Another aspect of our analysis is to demonstrate the consistency of certain classification methods using convex risk minimization. This study sheds light on the good performance of some recently proposed linear classification methods including boosting and sup- port vector machines. It also shows their limitations and suggests possible improvements. 1. Motivation. In statistical machine learning, the goal is often to predict an unobserved output value y based on an observed input vector x. This requires us to estimate a functional relationship y ≈ f( x)from a set of example pairs of (x, y). Usually the quality of the predictor f( x)can be measured by a problem dependent loss function �(f (x), y) . In machine learning analysis, one assumes that the training data are drawn from an underlying distribution D which is not known. Our goal is to find a predictor f( x)so that the expected loss of f given below is as small as possible: L(f (·)) = EX,Y � � f( X), Y � ,