Integrating time alignment and neural networks for high performance continuous speech recognition
- 1 January 1991
- conference paper
- Published by Institute of Electrical and Electronics Engineers (IEEE)
- p. 105-108 vol.1
- https://doi.org/10.1109/icassp.1991.150289
Abstract
The authors describe two systems in which neural network classifiers are merged with dynamic programming (DP) time alignment methods to produce high-performance continuous speech recognizers. One system uses the connectionist Viterbi-training (CVT) procedure, in which a neural network with frame-level outputs is trained using guidance from a time alignment procedure. The other system uses multi-state time-delay neural networks (MS-TDNNs), in which embedded DP time alignment allows network training with only word-level external supervision. The CVT results on the, TI Digits are 99.1% word accuracy and 98.0% string accuracy. The MS-TDNNs are described in detail, with attention focused on their architecture, the training procedure, and results of applying the MS-TDNNs to continuous speaker-dependent alphabet recognition: on two speakers, word accuracy is respectively 97.5% and 89.7%.Keywords
This publication has 7 references indexed in Scilit:
- Speaker-independent word recognition using dynamic programming neural networksPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2003
- Phonetically sensitive discriminants for improved speech recognitionPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2003
- Consonant recognition by modular construction of large phonemic time-delay neural networksPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2003
- Connectionist Viterbi training: a new hybrid method for continuous speech recognitionPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2002
- Fast back-propagation learning methods for large phonemic neural networksPublished by International Speech Communication Association ,1989
- The Acoustic-Modeling Problem in Automatic Speech Recognition.Published by Defense Technical Information Center (DTIC) ,1987
- The use of a one-stage dynamic programming algorithm for connected word recognitionIEEE Transactions on Acoustics, Speech, and Signal Processing, 1984