Diffusion Approximations for the Constant Learning Rate Backpropagation Algorithm and Resistance to Local Minima
- 1 March 1994
- journal article
- Published by MIT Press in Neural Computation
- Vol. 6 (2) , 285-295
- https://doi.org/10.1162/neco.1994.6.2.285
Abstract
In this paper we discuss the asymptotic properties of the most commonly used variant of the backpropagation algorithm in which network weights are trained by means of a local gradient descent on examples drawn randomly from a fixed training set, and the learning rate η of the gradient updates is held constant (simple backpropagation). Using stochastic approximation results, we show that for η → 0 this training process approaches a batch training. Further, we show that for small η one can approximate simple backpropagation by the sum of a batch training process and a gaussian diffusion, which is the unique solution to a linear stochastic differential equation. Using this approximation we indicate the reasons why simple backpropagation is less likely to get stuck in local minima than the batch training process and demonstrate this empirically on a number of examples.Keywords
This publication has 5 references indexed in Scilit:
- Improving model selection by nonconvergent methodsNeural Networks, 1993
- Convergence of learning algorithms with constant learning ratesIEEE Transactions on Neural Networks, 1991
- Learning in Artificial Neural Networks: A Statistical PerspectiveNeural Computation, 1989
- Some Asymptotic Results for Learning in Single Hidden-Layer Feedforward Network ModelsJournal of the American Statistical Association, 1989
- Théorèmes de convergence presque sure pour une classe d'algorithmes stochastiques à pas décroissantProbability Theory and Related Fields, 1987