On smoothing techniques for bigram-based natural language modelling
- 1 January 1991
- conference paper
- Published by Institute of Electrical and Electronics Engineers (IEEE)
- p. 825-828 vol.2
- https://doi.org/10.1109/icassp.1991.150464
Abstract
The authors study various problems related to smoothing bigram probabilities for natural language modeling: the type of interpolation, i.e. linear vs. nonlinear, the optimal estimation of interpolation parameters, and the use of word equivalence classes (parts of speech). A nonlinear interpolation method that results in significant improvements over linear interpolation in the experimental tests is proposed. It is shown that the leaving-one-out method in combination with the maximum likelihood criterion can be efficiently used for the optimal estimation of interpolation parameters. In addition, an automatic clustering procedure is developed for finding word equivalence classes using a maximum likelihood criterion. Experimental results are presented for two text databases: a German database with 100000 words and an English database with 1.1 million words.Keywords
This publication has 9 references indexed in Scilit:
- A 10000-word continuous-speech recognition systemPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2002
- A cache-based natural language model for speech recognitionPublished by Institute of Electrical and Electronics Engineers (IEEE) ,1990
- Estimation of probabilities from sparse data for the language model component of a speech recognizerIEEE Transactions on Acoustics, Speech, and Signal Processing, 1987
- Natural Language Modeling for Phoneme-to-Text TranscriptionPublished by Institute of Electrical and Electronics Engineers (IEEE) ,1986
- On Turing's formula for word probabilitiesIEEE Transactions on Acoustics, Speech, and Signal Processing, 1985
- Markov Source Modeling of Text GenerationPublished by Springer Nature ,1985
- A Maximum Likelihood Approach to Continuous Speech RecognitionPublished by Institute of Electrical and Electronics Engineers (IEEE) ,1983
- Theory of Point EstimationPublished by Springer Nature ,1983
- The Population Frequencies of Species and the Estimation of Population ParametersBiometrika, 1953