Exploiting latent semantic information in statistical language modeling
- 1 August 2000
- journal article
- Published by Institute of Electrical and Electronics Engineers (IEEE) in Proceedings of the IEEE
- Vol. 88 (8) , 1279-1296
- https://doi.org/10.1109/5.880084
Abstract
Statistical language models used in large-vocabulary speech recognition must properly encapsulate the various constraints, both local and global, present in the language. While local constraints are readily captured through n-gram modeling, global constraints, such as long-term semantic dependencies, have been more difficult to handle within a data-driven formalism. This paper focuses on the use of latent semantic analysis, a paradigm that automatically uncovers the salient semantic relationships between words and documents in a given corpus. In this approach, (discrete) words and documents are mapped onto a (continuous) semantic vector space, in which familiar clustering techniques can be applied. This leads to the specification of a powerful framework for automatic semantic classification, as well as the derivation of several language model families with various smoothing properties. Because of their large-span nature, these language models are well suited to complement conventional n-grams. An integrative formulation is proposed for harnessing this synergy, in which the latent semantic information is used to adjust the standard n-gram probability. Such hybrid language modeling compares favorably with the corresponding n-gram baseline: experiments conducted on the Wall Street Journal domain show a reduction in average word error rate of over 20%. This paper concludes with a discussion of intrinsic tradeoffs, such as the influence of training data selection on the resulting performance.Keywords
This publication has 38 references indexed in Scilit:
- Latent Semantic Indexing: A Probabilistic AnalysisJournal of Computer and System Sciences, 2000
- Two decades of statistical language modeling: where do we go from here?Proceedings of the IEEE, 2000
- Probabilistic Topic Maps: Navigating through Large Text CollectionsPublished by Springer Nature ,1999
- Modeling long distance dependence in language: topic mixtures versus dynamic cache modelsIEEE Transactions on Speech and Audio Processing, 1999
- On topic identification and dialogue move recognitionComputer Speech & Language, 1997
- A maximum entropy approach to adaptive statistical language modellingComputer Speech & Language, 1996
- Language modelling for efficient beam-searchComputer Speech & Language, 1995
- Personalized information deliveryCommunications of the ACM, 1992
- Adaptive language modeling using minimum discriminant estimationPublished by Institute of Electrical and Electronics Engineers (IEEE) ,1992
- Integration of speech recognition and natural language processing in the MIT VOYAGER systemPublished by Institute of Electrical and Electronics Engineers (IEEE) ,1991