Speaker-independent recognition of connected utterances using recurrent and non-recurrent neural networks

Abstract

Connectionist learning procedures are applied to the task of speaker-independent continuous speech recognition, creating a system which has achieved a recognition rate of 97% correct in preliminary tests on the Texas Instruments/National Bureau of Standards Connected Digits Database. Two versions of the system were implemented, both of which used four-layer backpropagation networks. One used a static (nonrecurrent) network with a history mechanism, in which the input weights were slaved together, as they are in time-delay neural networks (TDNNs), and the other used a recurrent connection structure similar to that proposed by J.L. Elman (Tech. Rep., Univ. of California, San Diego, April 1988). The final recognition accuracies produced by the two approaches were not significantly different. The networks generated and refined hypotheses about the identity of utterances over successive intervals. The hypotheses generated by the networks were used as input to a Markov-chain-based Viterbi recognizer which produced a final identification of the entire utterance.

Keywords

This publication has 6 references indexed in Scilit:

A database for speaker-independent digit recognition
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2005
Context-dependent modeling for acoustic-phonetic recognition of continuous speech
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2005
Consonant recognition by modular construction of large phonemic time-delay neural networks
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2003
The SPHINX speech recognition system
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2003
A neural net approach to speech recognition
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2003
Analysis of hidden units in a layered network trained to classify sonar targets
Neural Networks, 1988