Diagrammatic Derivation of Gradient Algorithms for Neural Networks

Abstract
Deriving gradient algorithms for time-dependent neural network structures typically requires numerous chain rule expansions, diligent bookkeeping, and careful manipulation of terms. In this paper, we show how to derive such algorithms via a set of simple block diagram manipulation rules. The approach provides a common framework to derive popular algorithms including backpropagation and backpropagation-through-time without a single chain rule expansion. Additional examples are provided for a variety of complicated architectures to illustrate both the generality and the simplicity of the approach.