Using statistical methods to improve knowledge-based news categorization
- 1 April 1993
- journal article
- Published by Institute of Electrical and Electronics Engineers (IEEE) in IEEE Expert
- Vol. 8 (2) , 13-23
- https://doi.org/10.1109/64.207425
Abstract
NLDB, a knowledge-based system that automatically categorizes news stories for dissemination, retrieval, and browsing, is discussed. The major knowledge-based component of NLDB is a lexicosemantic pattern matcher that identifies combinations of words and phrases, as well as more complex patterns. These include word roots, grammatical categories, and semantic structures, such as verbs describing classes of events. It is shown that this linguistic analysis outperforms statistical methods. Because building lexicosemantic patterns can be a laborious process, a set of statistical methods that automate pattern acquisition while preserving the benefits of a knowledge-based approach are developed.Keywords
This publication has 9 references indexed in Scilit:
- Extracting company names from textPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2002
- An evaluation of phrasal and clustered representations on a text categorization taskPublished by Association for Computing Machinery (ACM) ,1992
- Classifying texts using relevancy signaturesPublished by Association for Computational Linguistics (ACL) ,1992
- Creating segmented databases from free text for text retrievalPublished by Association for Computing Machinery (ACM) ,1991
- Lexico-semantic pattern matching as a companion to parsing in text understandingPublished by Association for Computational Linguistics (ACL) ,1991
- GEPublished by Association for Computational Linguistics (ACL) ,1991
- SCISOR: extracting information from on-line newsCommunications of the ACM, 1990
- Word association norms, mutual information, and lexicographyPublished by Association for Computational Linguistics (ACL) ,1989
- The use of titles for automatic document classificationJournal of the American Society for Information Science, 1980