Model complexity control for regression using VC generalization bounds

1 January 1999

journal article
Published by Institute of Electrical and Electronics Engineers (IEEE) in IEEE Transactions on Neural Networks

Vol. 10 (5) , 1075-1089
https://doi.org/10.1109/72.788648

Abstract

It is well known that for a given sample size there exists a model of optimal complexity corresponding to the smallest prediction (generalization) error. Hence, any method for learning from finite samples needs to have some provisions for complexity control. Existing implementations of complexity control include penalization (or regularization), weight decay (in neural networks), and various greedy procedures (aka constructive, growing, or pruning methods). There are numerous proposals for determining optimal model complexity (aka model selection) based on various (asymptotic) analytic estimates of the prediction risk and on resampling approaches. Nonasymptotic bounds on the prediction risk based on Vapnik-Chervonenkis (VC)-theory have been proposed by Vapnik. This paper describes application of VC-bounds to regression problems with the usual squared loss. An empirical study is performed for settings where the VC-bounds can be rigorously applied, i.e., linear models and penalized linear models where the VC-dimension can be accurately estimated, and the empirical risk can be reliably minimized. Empirical comparisons between model selection using VC-bounds and classical methods are performed for various noise levels, sample size, target functions and types of approximating functions. Our results demonstrate the advantages of VC-based complexity control with finite samples.

Keywords

This publication has 16 references indexed in Scilit:

The Risk Inflation Criterion for Multiple Regression
The Annals of Statistics, 1994
Measuring the VC-Dimension of a Learning Machine
Neural Computation, 1994
Network information criterion-determining the number of hidden units for an artificial neural network model
IEEE Transactions on Neural Networks, 1994
An Overview of Predictive Learning and Function Approximation
Published by Springer Nature ,1994
Linear Model Selection by Cross-validation
Journal of the American Statistical Association, 1993
How Far Are Automatically Chosen Regression Smoothing Parameters From Their Optimum?
Journal of the American Statistical Association, 1988
An optimal selection of regression variables
Biometrika, 1981
Smoothing noisy data with spline functions
Numerische Mathematik, 1978
Estimating the Dimension of a Model
The Annals of Statistics, 1978
Statistical predictor identification
Annals of the Institute of Statistical Mathematics, 1970