A unified framework for gradient algorithms used for filter adaptation and neural network training

1 March 1992

journal article
research article
Published by Wiley in International Journal of Circuit Theory and Applications

Vol. 20 (2) , 159-200
https://doi.org/10.1002/cta.4490200205

Abstract

In this paper we present in a unified framework the gradient algorithms employed in the adaptation of linear time filters (TF) and the supervised training of (non‐linear) neural networks (NN). the optimality criteria used to optimize the parametersHof the filter or network are the least squares (LS) and least mean squares (LMS) in both contexts. They respectively minimize the total or the mean squares of the errore(k)between an (output) reference sequenced(k)and the actual system outputy(k)corresponding to the inputX(k). Minimization is performed iteratively by a gradient algorithm. the indexkin (TF) is time and it runs indefinitely. Thus iterations start as soon as reception ofX(k)begins. the recursive algorithm for the adaptationH(k– 1) →H(k)of the parameters is implemented each time a new inputX(k)is observed. When training a (NN) with a finite number of examples, the indexkdenotes the example and it is upper‐bounded. Iterative (block) algorithms wait until allKexamples are received to begin the network updating. However,Kbeing frequently very large, recursive algorithms are also often preferred in (NN) training, but they raise the question of ordering the examplesX(k).Except in the specific case of a transversal filter, there is no general recursive technique for optimizing the LS criterion. However,X(k)is normally a random stationary sequence; thus LS and LMS are equivalent whenkbecomes large. Moreover, the LMS criterion can always be minimized recursively with the help of the stochastic LMS gradient algorithm, which has low computational complexity.In (TF),X(k)is a sliding window of (time) samples, whereas in the supervised training of (NN) with arbitrarily ordered examples,X(k– 1) andX(k)have nothing to do with each other. When this (major) difference is rubbed out by plugging a time signal at the network input, the recursive algorithms recently developed for (NN) training become similar to those of adaptive filtering. In this context the present paper displays the similarities between adaptive cascaded linear filters and trained multilayer networks. It is also shown that there is a close similarity between adaptive recursive filters and neural networks including feedback loops.The classical filtering approach is to evaluate the gradient by ‘forward propagation’, whereas the most popular (NN) training method uses a gradient backward propagation method. We show that when a linear (TF) problem is implemented by an (NN), the two approaches are equivalent. However, the backward method can be used for more general (non‐linear) filtering problems. Conversely, new insights can be drawn in the (NN) context by the use of a gradient forward computation.The advantage of the (NN) framework, and in particular of the gradient backward propagation approach, is evidently to have a much larger spectrum of applications than (TF), since (i) the inputs are arbitrary and (ii) the (NN) can perform non‐linear (TF).

Keywords

This publication has 23 references indexed in Scilit:

Joint adaptive echo cancellation and channel equalization for data transmission
Signal Processing, 1990
Backpropagation through time: what it does and how to do it
Proceedings of the IEEE, 1990
Performance surfaces of a single-layer perceptron
IEEE Transactions on Neural Networks, 1990
A Learning Algorithm for Continually Running Fully Recurrent Neural Networks
Neural Computation, 1989
Neurocomputers
Neurocomputing, 1989
A new LMS-based algorithm for rapid adaptive classification in dynamic environments
Neural Networks, 1989
Generalization of back-propagation to recurrent neural networks
Physical Review Letters, 1987
Second-order convergence analysis of stochastic adaptive linear filtering
IEEE Transactions on Automatic Control, 1983
Error surfaces of recursive adaptive filters
IEEE Transactions on Acoustics, Speech, and Signal Processing, 1981
A convergence proof for a hyperstable adaptive recursive filter (Corresp.)
IEEE Transactions on Information Theory, 1979