An automatic technique to include grammatical and morphological information in a trigram-based statistical language model
- 1 January 1992
- conference paper
- Published by Institute of Electrical and Electronics Engineers (IEEE)
- Vol. 1 (15206149) , 157-160 vol.1
- https://doi.org/10.1109/icassp.1992.225948
Abstract
A technique to take into account grammatical and morphological information in a trigram-based statistical language model is presented. This is automatically achieved by interpolating the trigram model (which uses sequences of words) with statistical models based on sequences of grammatical categories and/or lemmas. Such an approach reduces the effect of data sparseness in the trigram model due also to the way interpolation coefficients are chosen. With respect to trigrams, the authors obtained a significant reduction in perplexity on various texts even when combining a well-trained trigram model with a small grammatical/morphological model.Keywords
This publication has 8 references indexed in Scilit:
- Three probabilistic language models for a large-vocabulary speech recognizerPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2003
- A morphological model for large vocabulary speech recognitionPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2002
- 1. 0 TANGORA - a large vocabulary speech recognition system for five languagesPublished by International Speech Communication Association ,1991
- A technique to automatically assign parts-of-speech to words taking into account word-ending information through a probabilistic modelPublished by International Speech Communication Association ,1991
- Three different probabilistic language models: comparison and combinationPublished by Institute of Electrical and Electronics Engineers (IEEE) ,1991
- Measuring information provided by language model and acoustic model in probabilistic speech recognition: Theory and experimental resultsSpeech Communication, 1990
- Estimation of probabilities from sparse data for the language model component of a speech recognizerIEEE Transactions on Acoustics, Speech, and Signal Processing, 1987
- A Maximum Likelihood Approach to Continuous Speech RecognitionPublished by Institute of Electrical and Electronics Engineers (IEEE) ,1983