Parameter Convergence and Learning Curves for Neural Networks
- 1 April 1999
- journal article
- review article
- Published by MIT Press in Neural Computation
- Vol. 11 (3) , 747-769
- https://doi.org/10.1162/089976699300016647
Abstract
We revisit the oft-studied asymptotic (in sample size) behavior of the parameter or weight estimate returned by any member of a large family of neural network training algorithms. By properly accounting for the characteristic property of neural networks that their empirical and generalization errors possess multiple minima, we rigorously establish conditions under which the parameter estimate converges strongly into the set of minima of the generalization error. Convergence of the parameter estimate to a particular value cannot be guaranteed under our assumptions. We then evaluate the asymptotic distribution of the distance between the parameter estimate and its nearest neighbor among the set of minima of the generalization error. Results on this question have appeared numerous times and generally assert asymptotic normality, the conclusion expected from familiar statistical arguments concerned with maximum likelihood estimators. These conclusions are usually reached on the basis of somewhat informal calculations, although we shall see that the situation is somewhat delicate. The preceding results then provide a derivation of learning curves for generalization and empirical errors that leads to bounds on rates of convergence.Keywords
This publication has 14 references indexed in Scilit:
- Asymptotic statistical theory of overtraining and cross-validationIEEE Transactions on Neural Networks, 1997
- Online Steepest Descent Yields Weights with Nonnormal Limiting DistributionNeural Computation, 1996
- Functionally Equivalent Feedforward Neural NetworksNeural Computation, 1994
- Training feedforward networks with the Marquardt algorithmIEEE Transactions on Neural Networks, 1994
- Sharper Bounds for Gaussian and Empirical ProcessesThe Annals of Probability, 1994
- A universal theorem on learning curvesNeural Networks, 1993
- Statistical Theory of Learning Curves under Entropic Loss CriterionNeural Computation, 1993
- Feedforward nets for interpolation and classificationJournal of Computer and System Sciences, 1992
- Some Asymptotic Results for Learning in Single Hidden-Layer Feedforward Network ModelsJournal of the American Statistical Association, 1989
- Some Asymptotic Results for Learning in Single Hidden-Layer Feedforward Network ModelsJournal of the American Statistical Association, 1989