Connected-digit speaker-dependent speech recognition using a neural network with time-delayed connections

1 March 1991

journal article
Published by Institute of Electrical and Electronics Engineers (IEEE) in IEEE Transactions on Signal Processing

Vol. 39 (3) , 698-713
https://doi.org/10.1109/78.80888

Abstract

An analog neural network that can be taught to recognize stimulus sequences is used to recognize the digits in connected speech. The circuit computes in the analog domain, using linear circuits for signal filtering and nonlinear circuits for simple decisions, feature extraction, and noise suppression. An analog perceptron learning rule is used to organize the subset of connections used in the circuit that are specific to the chosen vocabulary. Computer simulations of the learning algorithm and circuit demonstrate recognition scores >99 % for a single-speaker connected-digit data base. There is no clock. The circuit is data driven, and there is no necessity for endpoint detection or segmentation of the speech signal during recognition. Training in the presence of noise provides noise immunity up to the trained level. For the speech problem studied, the circuit connections need only be accurate to about 3-b digitization depth for optimum performance. The algorithm used maps efficiently onto analog neutral network hardware.

Keywords

This publication has 21 references indexed in Scilit:

An analog electronic cochlea
IEEE Transactions on Acoustics, Speech, and Signal Processing, 1988
Learning the hidden structure of speech
The Journal of the Acoustical Society of America, 1988
Network-based connected digit recognition
IEEE Transactions on Acoustics, Speech, and Signal Processing, 1987
Spectral and temporal response patterns of single units in the chinchilla dorsal cochlear nucleus
Experimental Neurology, 1987
Learning representations by back-propagating errors
Nature, 1986
Computing with Neural Circuits: A Model
Science, 1986
Speech processing in the auditory system II: Lateral inhibition and the central processing of speech evoked activity in the auditory nerve
The Journal of the Acoustical Society of America, 1985
On the effects of varying filter bank parameters on isolated word recognition
IEEE Transactions on Acoustics, Speech, and Signal Processing, 1983
A level building dynamic time warping algorithm for connected word recognition
IEEE Transactions on Acoustics, Speech, and Signal Processing, 1981
Minimum prediction residual principle applied to speech recognition
IEEE Transactions on Acoustics, Speech, and Signal Processing, 1975