Abstract
A technique to take into account grammatical and morphological information in a trigram-based statistical language model is presented. This is automatically achieved by interpolating the trigram model (which uses sequences of words) with statistical models based on sequences of grammatical categories and/or lemmas. Such an approach reduces the effect of data sparseness in the trigram model due also to the way interpolation coefficients are chosen. With respect to trigrams, the authors obtained a significant reduction in perplexity on various texts even when combining a well-trained trigram model with a small grammatical/morphological model.

This publication has 8 references indexed in Scilit: