SEMI-AUTOMATIC EXTRACTION OF LINGUISTIC INFORMATION FOR SYNTACTIC DISAMBIGUATION
- 1 October 1993
- journal article
- research article
- Published by Taylor & Francis in Applied Artificial Intelligence
- Vol. 7 (4) , 339-364
- https://doi.org/10.1080/08839519308949994
Abstract
The robustness of NLP techniques can be improved by the use of “shallow” methods such as statistical analysis in combination with traditional knowledge-based methods, such as syntax and semantics This paper describes a hybrid methodology to extract from corpora preference criteria for syntactically ambiguous structures. The method is based on the statistical analysis of word co-occurrences augmented with syntactic and semantic tags, which we call clustered association data. The proposed method is shown to exhibit a better trade-off between precision of the acquired data and the amount of manual work required, with respect to other similar algorithms proposed in the literature. Furthermore, the use of semantic tags makes it possible to obtain a statistically relevant number of reliable data even when the application corpus.does not exceed 500,000 words.Keywords
This publication has 3 references indexed in Scilit:
- Knowledge-Based Techniques for Information RetrievalInternational Journal of Intelligent Systems, 1989
- Discontinuous grammars, 2Computational Intelligence, 1989
- Lexical Co-occurrence: The Missing LinkLiterary and Linguistic Computing, 1989