Visual Recognition and Inference Using Dynamic Overcomplete Sparse Learning

1 September 2007

journal article
Published by MIT Press in Neural Computation

Vol. 19 (9) , 2301-2352
https://doi.org/10.1162/neco.2007.19.9.2301

Abstract

We present a hierarchical architecture and learning algorithm for visual recognition and other visual inference tasks such as imagination, reconstruction of occluded images, and expectation-driven segmentation. Using properties of biological vision for guidance, we posit a stochastic generative world model and from it develop a simplified world model (SWM) based on a tractable variational approximation that is designed to enforce sparse coding. Recent developments in computational methods for learning overcomplete representations (Lewicki & Sejnowski, 2000; Teh, Welling, Osindero, & Hinton, 2003) suggest that overcompleteness can be useful for visual tasks, and we use an overcomplete dictionary learning algorithm (Kreutz-Delgado, et al., 2003) as a preprocessing stage to produce accurate, sparse codings of images. Inference is performed by constructing a dynamic multilayer network with feedforward, feedback, and lateral connections, which is trained to approximate the SWM. Learning is done with a variant of the back-propagation-through-time algorithm, which encourages convergence to desired states within a fixed number of iterations. Vision tasks require large networks, and to make learning efficient, we take advantage of the sparsity of each layer to update only a small subset of elements in a large weight matrix at each iteration. Experiments on a set of rotated objects demonstrate various types of visual inference and show that increasing the degree of overcompleteness improves recognition performance in difficult scenes with occluded objects in clutter.

Keywords

This publication has 49 references indexed in Scilit:

A Fast Learning Algorithm for Deep Belief Nets
Neural Computation, 2006
Restoring partly occluded patterns: a neural network model
Neural Networks, 2005
Sparse coding with an overcomplete basis set: A strategy employed by V1?
Published by Elsevier ,2003
Hierarchical Bayesian inference in the visual cortex
Journal of the Optical Society of America A, 2003
Neocognitron: A new algorithm for pattern recognition tolerant of deformations and shifts in position
Published by Elsevier ,2003
Generative models for discovering sparse distributed representations
Philosophical Transactions Of The Royal Society B-Biological Sciences, 1997
Borders of Multiple Visual Areas in Humans Revealed by Functional Magnetic Resonance Imaging
Science, 1995
Learning Invariance from Transformation Sequences
Neural Computation, 1991
An Efficient Gradient-Based Algorithm for On-Line Training of Recurrent Network Trajectories
Neural Computation, 1990
On the Distinction Between the Conditional Probability and the Joint Probability Approaches in the Specification of Nearest-Neighbour Systems
Biometrika, 1964