Modeling long distance dependence in language: topic mixtures vs. dynamic cache models

24 December 2002

proceedings article
Published by Institute of Electrical and Electronics Engineers (IEEE)

Vol. 1, 236-239
https://doi.org/10.1109/icslp.1996.607085

Abstract

In this paper, we investigate a new statistical language model which captures topic-related dependencies of words within and across sen- tences. First, we develop a sentence-level mixture language model that takes advantage of the topic constraints in a sentence or article. Second, we introduce topic-dependent dynamic cache adaptation techniques in the framework of the mixture model. Experiments with the static (or unadapted) mixture model on the 1994 WSJ task indicated a 21% reduction in perplexity and a 3-4% improvement in recognition accuracy over a general -gram model. The static mix- ture model also improved recognition performance over an adapted -gram model. Mixture adaptation techniques contributed a further 14% reduction in perplexity and a small improvement in recognition accuracy.

Keywords

This publication has 8 references indexed in Scilit:

A hybrid approach to adaptive statistical language modeling
Published by Association for Computational Linguistics (ACL) ,1994
Language modeling with sentence-level mixtures
Published by Association for Computational Linguistics (ACL) ,1994
On the dynamic adaptation of stochastic language models
Published by Institute of Electrical and Electronics Engineers (IEEE) ,1993
Trigger-based language models: a maximum entropy approach
Published by Institute of Electrical and Electronics Engineers (IEEE) ,1993
Statistical language modeling combining N-gram and context-free grammars
Published by Institute of Electrical and Electronics Engineers (IEEE) ,1993
The estimation of powerful language models from small and large corpora
Published by Institute of Electrical and Electronics Engineers (IEEE) ,1993
A dynamic language model for speech recognition
Published by Association for Computational Linguistics (ACL) ,1991
A cache-based natural language model for speech recognition
Published by Institute of Electrical and Electronics Engineers (IEEE) ,1990