A morphological model for large vocabulary speech recognition

Abstract
A morphological model, applicable to inflected languages, which combines the robustness of the tripos model with the prediction power of the lemma is proposed. A semantic component acts at the lemma level, without taking into account the different inflections of a lemma, thus making its trainable even for 200000 words. The training corpus for the lemma model (consisting of 38 million words) is labeled in terms of lemma and part of speech, using a semiautomatic process. The results obtained with this new model are reported. The model shows another way to put knowledge in the pure probabilistic framework of hidden Markov models.

This publication has 8 references indexed in Scilit: