Diffusion Approximations for the Constant Learning Rate Backpropagation Algorithm and Resistance to Local Minima

1 March 1994

journal article
Published by MIT Press in Neural Computation

Vol. 6 (2) , 285-295
https://doi.org/10.1162/neco.1994.6.2.285

Abstract

In this paper we discuss the asymptotic properties of the most commonly used variant of the backpropagation algorithm in which network weights are trained by means of a local gradient descent on examples drawn randomly from a fixed training set, and the learning rate η of the gradient updates is held constant (simple backpropagation). Using stochastic approximation results, we show that for η → 0 this training process approaches a batch training. Further, we show that for small η one can approximate simple backpropagation by the sum of a batch training process and a gaussian diffusion, which is the unique solution to a linear stochastic differential equation. Using this approximation we indicate the reasons why simple backpropagation is less likely to get stuck in local minima than the batch training process and demonstrate this empirically on a number of examples.

Keywords

This publication has 5 references indexed in Scilit:

Improving model selection by nonconvergent methods
Neural Networks, 1993
Convergence of learning algorithms with constant learning rates
IEEE Transactions on Neural Networks, 1991
Learning in Artificial Neural Networks: A Statistical Perspective
Neural Computation, 1989
Some Asymptotic Results for Learning in Single Hidden-Layer Feedforward Network Models
Journal of the American Statistical Association, 1989
Théorèmes de convergence presque sure pour une classe d'algorithmes stochastiques à pas décroissant
Probability Theory and Related Fields, 1987