Class phrase models for language modeling
- 24 December 2002
- conference paper
- Published by Institute of Electrical and Electronics Engineers (IEEE)
- Vol. 1, 398-401
- https://doi.org/10.1109/icslp.1996.607138
Abstract
Previous attempts to automatically determine multi-words as the basic unit for language modeling have been successful for extending bigram models to improve the perplexity of the language model and/or the word accuracy of the speech decoder. However, none of these techniques gave improvements over the trigram model so far, except for the rather controlled ATIS task (McCandless & Glass, 1994). We therefore propose an algorithm that minimizes the perplexity of a bigram model directly. The new algorithm is able to reduce the trigram perplexity and also achieves word accuracy improvements in the Verbmobil task. It is the natural counterpart of successful word classification algorithms for language modeling that minimize the leaving-one-out bigram perplexity. We also give some details on the usage of class-finding techniques and m-gram models, which can be crucial to successful applications of this technique.Keywords
This publication has 5 references indexed in Scilit:
- Improved language modelling by unsupervised acquisition of structurePublished by Institute of Electrical and Electronics Engineers (IEEE) ,2002
- Improved backing-off for M-gram language modelingPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2002
- Algorithms for bigram and trigram word clusteringSpeech Communication, 1998
- On automated language acquisitionThe Journal of the Acoustical Society of America, 1995
- Improved clustering techniques for class-based statistical language modellingPublished by International Speech Communication Association ,1993