Cooccurrence smoothing for stochastic language modeling

1 January 1992

conference paper
Published by Institute of Electrical and Electronics Engineers (IEEE)

Vol. 1, 161-164 vol.1
https://doi.org/10.1109/icassp.1992.225947

Abstract

Training corpora for stochastic language models are virtually always too small for maximum-likelihood estimation, so smoothing the models is of great importance. The authors derive the cooccurrence smoothing technique for stochastic language modeling and give experimental evidence for its validity. Using word-bigram language models, cooccurrence smoothing improved the test-set perplexity by 14% on a German 100000-word text corpus and by 10% on an English 1-million word corpus.

Keywords

This publication has 3 references indexed in Scilit:

Isolated word recognition using hidden Markov models
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2005
Robust smoothing methods for discrete hidden Markov models
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2003
On smoothing techniques for bigram-based natural language modelling
Published by Institute of Electrical and Electronics Engineers (IEEE) ,1991