Statistical language modeling using a variable context length
- 24 December 2002
- proceedings article
- Published by Institute of Electrical and Electronics Engineers (IEEE)
- Vol. 1, 494-497
- https://doi.org/10.1109/icslp.1996.607162
Abstract
In this paper we investigate statistical language models with avariable context length. For such models the number of relevantwords in a context is not fixed as in conventional M-gram models but depends on the context itself. We develop a measure for the quality of variable-length models and present a pruning algorithm for the creation of such models, based on this measure. Further we address the question how the use of a special backing-off distribution can improve the language models....Keywords
This publication has 7 references indexed in Scilit:
- A variable-length category-based n-gram language modelPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2002
- Language modeling by variable length sequences: theoretical formulation and evaluation of multigramsPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2002
- Improved backing-off for M-gram language modelingPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2002
- Large vocabulary continuous speech recognition of Wall Street Journal dataPublished by Institute of Electrical and Electronics Engineers (IEEE) ,1994
- On structuring probabilistic dependences in stochastic language modellingComputer Speech & Language, 1994
- The design for the wall street journal-based CSR corpusPublished by Association for Computational Linguistics (ACL) ,1992
- Estimation of probabilities from sparse data for the language model component of a speech recognizerIEEE Transactions on Acoustics, Speech, and Signal Processing, 1987