Integrating time alignment and neural networks for high performance continuous speech recognition

1 January 1991

conference paper
Published by Institute of Electrical and Electronics Engineers (IEEE)

p. 105-108 vol.1
https://doi.org/10.1109/icassp.1991.150289

Abstract

The authors describe two systems in which neural network classifiers are merged with dynamic programming (DP) time alignment methods to produce high-performance continuous speech recognizers. One system uses the connectionist Viterbi-training (CVT) procedure, in which a neural network with frame-level outputs is trained using guidance from a time alignment procedure. The other system uses multi-state time-delay neural networks (MS-TDNNs), in which embedded DP time alignment allows network training with only word-level external supervision. The CVT results on the, TI Digits are 99.1% word accuracy and 98.0% string accuracy. The MS-TDNNs are described in detail, with attention focused on their architecture, the training procedure, and results of applying the MS-TDNNs to continuous speaker-dependent alphabet recognition: on two speakers, word accuracy is respectively 97.5% and 89.7%.

Keywords

This publication has 7 references indexed in Scilit:

Speaker-independent word recognition using dynamic programming neural networks
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2003
Phonetically sensitive discriminants for improved speech recognition
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2003
Consonant recognition by modular construction of large phonemic time-delay neural networks
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2003
Connectionist Viterbi training: a new hybrid method for continuous speech recognition
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2002
Fast back-propagation learning methods for large phonemic neural networks
Published by International Speech Communication Association ,1989
The Acoustic-Modeling Problem in Automatic Speech Recognition.
Published by Defense Technical Information Center (DTIC) ,1987
The use of a one-stage dynamic programming algorithm for connected word recognition
IEEE Transactions on Acoustics, Speech, and Signal Processing, 1984