Combination of word-based and category-based language models
- 24 December 2002
- conference paper
- Published by Institute of Electrical and Electronics Engineers (IEEE)
- Vol. 1, 220-223
- https://doi.org/10.1109/icslp.1996.607081
Abstract
A language model combining word based and category based n grams within a backoff framework is presented. Word n grams conveniently capture sequential relations between particular words, while the category model, which is based on part of speech classifications and allows ambiguous category membership, is able to generalise to unseen word sequences and therefore appropriate in backoff situations. Experiments on the LOB, Switchboard and WSJO corpora demonstrate that the technique greatly improves language model perplexities for sparse training sets, and offers significantly improved complexity versus performance tradeoffs when compared with standard trigram models.Keywords
This publication has 2 references indexed in Scilit:
- A variable-length category-based n-gram language modelPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2002
- Estimation of probabilities from sparse data for the language model component of a speech recognizerIEEE Transactions on Acoustics, Speech, and Signal Processing, 1987