Automated learning of decision rules for text categorization
- 1 July 1994
- journal article
- research article
- Published by Association for Computing Machinery (ACM) in ACM Transactions on Information Systems
- Vol. 12 (3) , 233-251
- https://doi.org/10.1145/183422.183423
Abstract
We describe the results of extensive experiments using optimized rule-based induction methods on large document collections. The goal of these methods is to discover automatically classification patterns that can be used for general document categorization or personalized filtering of free text. Previous reports indicate that human-engineered rule-based systems, requiring many man-years of developmental efforts, have been successfully built to ''read'' documents and assign topics to them. We show that machine-generated decision rules appear comparable to human performance, while using the identical rule-based representation. In comparison with other machine-learning techniques, results on a key benchmark from the Reuters collection show a large gain in performance, from a previously reported 67% recall/precision breakeven point to 80.5%. In the context of a very high-dimensional feature space, several methodological alternatives are examined, including universal versus local dictionaries, and binary versus frequency-related features.Keywords
This publication has 12 references indexed in Scilit:
- SmcHD1, containing a structural-maintenance-of-chromosomes hinge domain, has a critical role in X inactivationNature Genetics, 2008
- Optimized rule inductionIEEE Expert, 1993
- An evaluation of phrasal and clustered representations on a text categorization taskPublished by Association for Computing Machinery (ACM) ,1992
- Maximizing the predictive value of production rulesArtificial Intelligence, 1990
- An architecture for probabilistic concept-based information retrievalPublished by Association for Computing Machinery (ACM) ,1989
- The CN2 induction algorithmMachine Learning, 1989
- The automatic indexing system AIR/PHYS - from research to applicationsPublished by Association for Computing Machinery (ACM) ,1988
- Simplifying decision treesInternational Journal of Man-Machine Studies, 1987
- An Effective Heuristic Algorithm for the Traveling-Salesman ProblemOperations Research, 1973
- The Design and Analysis of Pattern Recognition ExperimentsBell System Technical Journal, 1962