Three different probabilistic language models: comparison and combination

Abstract
The authors outline the different problems that arise when using a statistical language model for speech recognition, especially for inflected languages such as French, Italian or German. After a brief review of two classical models (TriPOS and Trigram), the authors present a refinement of the morphological language model (Trilemma). They give the different methods used to evaluate performances. They discuss combination experiments between two of these three building blocks and present a model which takes advantage of all three models through a backing-off strategy. Assuming the same vocabulary (20000 forms), experiments show equivalent results using either a classical trigram language model or a trilemma model. The second model can be extended to a full dictionary containing all the inflected forms of each lemma, whereas the first needs a large amount of data to perform such a task.

This publication has 7 references indexed in Scilit: