A unified framework for gradient algorithms used for filter adaptation and neural network training

Abstract
In this paper we present in a unified framework the gradient algorithms employed in the adaptation of linear time filters (TF) and the supervised training of (non‐linear) neural networks (NN). the optimality criteria used to optimize the parametersHof the filter or network are the least squares (LS) and least mean squares (LMS) in both contexts. They respectively minimize the total or the mean squares of the errore(k)between an (output) reference sequenced(k)and the actual system outputy(k)corresponding to the inputX(k). Minimization is performed iteratively by a gradient algorithm. the indexkin (TF) is time and it runs indefinitely. Thus iterations start as soon as reception ofX(k)begins. the recursive algorithm for the adaptationH(k– 1) →H(k)of the parameters is implemented each time a new inputX(k)is observed. When training a (NN) with a finite number of examples, the indexkdenotes the example and it is upper‐bounded. Iterative (block) algorithms wait until allKexamples are received to begin the network updating. However,Kbeing frequently very large, recursive algorithms are also often preferred in (NN) training, but they raise the question of ordering the examplesX(k).Except in the specific case of a transversal filter, there is no general recursive technique for optimizing the LS criterion. However,X(k)is normally a random stationary sequence; thus LS and LMS are equivalent whenkbecomes large. Moreover, the LMS criterion can always be minimized recursively with the help of the stochastic LMS gradient algorithm, which has low computational complexity.In (TF),X(k)is a sliding window of (time) samples, whereas in the supervised training of (NN) with arbitrarily ordered examples,X(k– 1) andX(k)have nothing to do with each other. When this (major) difference is rubbed out by plugging a time signal at the network input, the recursive algorithms recently developed for (NN) training become similar to those of adaptive filtering. In this context the present paper displays the similarities between adaptive cascaded linear filters and trained multilayer networks. It is also shown that there is a close similarity between adaptive recursive filters and neural networks including feedback loops.The classical filtering approach is to evaluate the gradient by ‘forward propagation’, whereas the most popular (NN) training method uses a gradient backward propagation method. We show that when a linear (TF) problem is implemented by an (NN), the two approaches are equivalent. However, the backward method can be used for more general (non‐linear) filtering problems. Conversely, new insights can be drawn in the (NN) context by the use of a gradient forward computation.The advantage of the (NN) framework, and in particular of the gradient backward propagation approach, is evidently to have a much larger spectrum of applications than (TF), since (i) the inputs are arbitrary and (ii) the (NN) can perform non‐linear (TF).

This publication has 23 references indexed in Scilit: