SEMI-AUTOMATIC EXTRACTION OF LINGUISTIC INFORMATION FOR SYNTACTIC DISAMBIGUATION

1 October 1993

journal article
research article
Published by Taylor & Francis in Applied Artificial Intelligence

Vol. 7 (4) , 339-364
https://doi.org/10.1080/08839519308949994

Abstract

The robustness of NLP techniques can be improved by the use of “shallow” methods such as statistical analysis in combination with traditional knowledge-based methods, such as syntax and semantics This paper describes a hybrid methodology to extract from corpora preference criteria for syntactically ambiguous structures. The method is based on the statistical analysis of word co-occurrences augmented with syntactic and semantic tags, which we call clustered association data. The proposed method is shown to exhibit a better trade-off between precision of the acquired data and the amount of manual work required, with respect to other similar algorithms proposed in the literature. Furthermore, the use of semantic tags makes it possible to obtain a statistically relevant number of reliable data even when the application corpus.does not exceed 500,000 words.

Keywords

This publication has 3 references indexed in Scilit:

Knowledge-Based Techniques for Information Retrieval
International Journal of Intelligent Systems, 1989
Discontinuous grammars, 2
Computational Intelligence, 1989
Lexical Co-occurrence: The Missing Link
Literary and Linguistic Computing, 1989