Tagging text with a probabilistic model

1 January 1991

conference paper
Published by Institute of Electrical and Electronics Engineers (IEEE)

Vol. 1 (15206149) , 809-812 vol.2
https://doi.org/10.1109/icassp.1991.150460

Abstract

Experiments on the use of a probabilistic model to tag English text, that is, to assign to each word the correct tag (part of speech) in the context of the sentence, are presented. A simple triclass Markov model is used, and the best way to estimate the parameters of this model, depending on the kind and amount of training data that is provided, is found. Two approaches are compared: the use of text that has been tagged by hand and comparing relative frequency counts; and use text without tags and training the model as a hidden Markov process, according to a maximum likelihood principle. Experiments show that the best training is obtained by using as much tagged text as is available, a maximum likelihood training may improve the accuracy of the tagging.

Keywords

This publication has 5 references indexed in Scilit:

Lexicon and grammar in probabilistic tagging of written English
Published by Association for Computational Linguistics (ACL) ,1988
Natural Language Modeling for Phoneme-to-Text Transcription
Published by Institute of Electrical and Electronics Engineers (IEEE) ,1986
Modèle probabiliste d’un langage en reconnaissance de la parole
Annals of Telecommunications, 1984
Choice of grammatical word-class without global syntactic analysis: Tagging words in the lob corpus
Computers and the Humanities, 1983
Problems with Tagging – and a Solution
Nordic Journal of Linguistics, 1982