Structural risk minimization over data-dependent hierarchies

1 September 1998

journal article
Published by Institute of Electrical and Electronics Engineers (IEEE) in IEEE Transactions on Information Theory

Vol. 44 (5) , 1926-1940
https://doi.org/10.1109/18.705570

Abstract

The paper introduces some generalizations of Vapnik's (1982) method of structural risk minimization (SRM). As well as making explicit some of the details on SRM, it provides a result that allows one to trade off errors on the training sample against improved generalization performance. It then considers the more general case when the hierarchy of classes is chosen in response to the data. A result is presented on the generalization performance of classifiers with a "large margin". This theoretically explains the impressive generalization performance of the maximal margin hyperplane algorithm of Vapnik and co-workers (which is the basis for their support vector machines). The paper concludes with a more general result in terms of "luckiness" functions, which provides a quite general way for exploiting serendipitous simplicity in observed data to obtain better prediction accuracy from small training sets. Four examples are given of such functions, including the Vapnik-Chervonenkis (1971) dimension measured on the sample.

Keywords

This publication has 30 references indexed in Scilit:

Prediction, Learning, Uniform Convergence, and Scale-Sensitive Dimensions
Journal of Computer and System Sciences, 1998
A sufficient condition for polynomial distribution-dependent learnability
Discrete Applied Mathematics, 1997
Neural Networks with Quadratic VC Dimension
Journal of Computer and System Sciences, 1997
Shattering All Sets of ‘k’ Points in “General Position” Requires (k — 1)/2 Parameters
Neural Computation, 1997
Learning by canonical smooth estimation. II. Learning and choice of model complexity
IEEE Transactions on Automatic Control, 1996
Concept learning using complexity regularization
IEEE Transactions on Information Theory, 1996
Sample compression, learnability, and the Vapnik-Chervonenkis dimension
Machine Learning, 1995
Support-vector networks
Machine Learning, 1995
Nonparametric estimation via empirical risk minimization
IEEE Transactions on Information Theory, 1995
Nonuniform learnability
Journal of Computer and System Sciences, 1994