Non-deterministic stochastic language models for speech recognition
- 19 November 2002
- conference paper
- Published by Institute of Electrical and Electronics Engineers (IEEE)
- Vol. 1 (15206149) , 237-240
- https://doi.org/10.1109/icassp.1995.479408
Abstract
Traditional stochastic language models for speech recognition (i.e. n-grams) are deterministic, in the sense that there is one and only one derivation for each given sentence. Moreover a fixed temporal window is always assumed in the estimation of the traditional stochastic language models. This paper shows how non-determinism is introduced to effectively approximate a back-off n-gram language model through a finite state network formalism. It also shows that a new flexible and powerful network formalization can be obtained by releasing the assumption of a fixed history size. As a result, a class of automata for language modeling (variable n-gram stochastic automata) is obtained, for which we propose some methods for the estimation of the transition probabilities. VNSAs have been used in a spontaneous speech recognizer for the ATIS task. The accuracy on a standard test set is presented.Keywords
This publication has 2 references indexed in Scilit:
- The estimation of powerful language models from small and large corporaPublished by Institute of Electrical and Electronics Engineers (IEEE) ,1993
- The zero-frequency problem: estimating the probabilities of novel events in adaptive text compressionIEEE Transactions on Information Theory, 1991