Cooccurrence smoothing for stochastic language modeling
- 1 January 1992
- conference paper
- Published by Institute of Electrical and Electronics Engineers (IEEE)
- Vol. 1, 161-164 vol.1
- https://doi.org/10.1109/icassp.1992.225947
Abstract
Training corpora for stochastic language models are virtually always too small for maximum-likelihood estimation, so smoothing the models is of great importance. The authors derive the cooccurrence smoothing technique for stochastic language modeling and give experimental evidence for its validity. Using word-bigram language models, cooccurrence smoothing improved the test-set perplexity by 14% on a German 100000-word text corpus and by 10% on an English 1-million word corpus.Keywords
This publication has 3 references indexed in Scilit:
- Isolated word recognition using hidden Markov modelsPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2005
- Robust smoothing methods for discrete hidden Markov modelsPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2003
- On smoothing techniques for bigram-based natural language modellingPublished by Institute of Electrical and Electronics Engineers (IEEE) ,1991