Variable-length sequence modeling: multigrams

1 June 1995

journal article
Published by Institute of Electrical and Electronics Engineers (IEEE) in IEEE Signal Processing Letters

Vol. 2 (6) , 111-113
https://doi.org/10.1109/97.388911

Abstract

The conventional n-gram language model exploits dependencies between words and their fixed-length past. This letter presents a model that represents sentences as a concatenation of variable-length sequences of units and describes an algorithm for unsupervised estimation of the model parameters. The approach is illustrated for the segmentation of sequences of letters into subword-like units. It is evaluated as a language model on a corpus of transcribed spoken sentences. Multigrams can provide a significantly lower test set perplexity than n-gram models.

Keywords

NATURAL LANGUAGES
ENCODING
ACOUSTIC TESTING
CONTEXT MODELING
MATHEMATICAL MODEL
PARAMETER ESTIMATION
SPEECH
HISTORY
FINISHING
SIGNAL PROCESSING ALGORITHMS

This publication has 1 reference indexed in Scilit:

SELF-ORGANIZED LANGUAGE MODELING FOR SPEECH RECOGNITION
Published by Elsevier ,1990