Learning by on-line gradient descent

7 February 1995

journal article
Published by IOP Publishing in Journal of Physics A: General Physics

Vol. 28 (3) , 643-656
https://doi.org/10.1088/0305-4470/28/3/018

Abstract

We study on-line gradient-descent learning in multilayer networks analytically and numerically. The training is based on randomly drawn inputs and their corresponding outputs as defined by a target rule. In the thermodynamic limit we derive deterministic differential equations for the order parameters of the problem which allow an exact calculation of the evolution of the generalization error. First we consider a single-layer perceptron with sigmoidal activation function learning a target rule defined by a network of the same architecture. For this model the generalization error decays exponentially with the number of training examples if the learning rate is sufficiently small. However, if the learning rate is increased above a critical value, perfect learning is no longer possible. For architectures with hidden layers and fixed hidden-to-output weights, such as the parity and the committee machine, we find additional effects related to the existence of symmetries in these problems.

Keywords

This publication has 13 references indexed in Scilit:

An Exactly Solvable Model of Unsupervised Learning
Europhysics Letters, 1994
Learning a rule in a multilayer neural network
Journal of Physics A: General Physics, 1993
The statistical mechanics of learning a rule
Reviews of Modern Physics, 1993
Generalization ability of perceptrons with continuous outputs
Physical Review E, 1993
Stochastic dynamics of supervised learning
Journal of Physics A: General Physics, 1993
Statistical mechanics of learning from examples
Physical Review A, 1992
Learning processes in neural networks
Physical Review A, 1991
Statistical mechanics for networks of graded-response neurons
Physical Review A, 1991
Linear and Nonlinear Extension of the Pseudo-Inverse Solution for Learning Boolean Functions
Europhysics Letters, 1989
Review of Neural Networks for Speech Recognition
Neural Computation, 1989