Abstract
Connectionist learning procedures are applied to the task of speaker-independent continuous speech recognition, creating a system which has achieved a recognition rate of 97% correct in preliminary tests on the Texas Instruments/National Bureau of Standards Connected Digits Database. Two versions of the system were implemented, both of which used four-layer backpropagation networks. One used a static (nonrecurrent) network with a history mechanism, in which the input weights were slaved together, as they are in time-delay neural networks (TDNNs), and the other used a recurrent connection structure similar to that proposed by J.L. Elman (Tech. Rep., Univ. of California, San Diego, April 1988). The final recognition accuracies produced by the two approaches were not significantly different. The networks generated and refined hypotheses about the identity of utterances over successive intervals. The hypotheses generated by the networks were used as input to a Markov-chain-based Viterbi recognizer which produced a final identification of the entire utterance.

This publication has 6 references indexed in Scilit: