Feature selection, perceptron learning, and a usability case study for text categorization
- 1 July 1997
- journal article
- Published by Association for Computing Machinery (ACM) in ACM SIGIR Forum
- Vol. 31 (SI) , 67-73
- https://doi.org/10.1145/278459.258537
Abstract
In this paper, we describe an automated learning approach to text categorization based on perceptron learning and a new feature selection metric, called correlation coefficient. Our approach has been tested on the standard Reuters text categorization collection. Empirical results indicate that our approach outperforms the best published results on this Reuters collection. In particular, our new feature selection method yields considerable improvement. We also investigate the usability of our automated learning approach by actually developing a system that categorizes texts into a tree of categories. We compare the accuracy of our learning approach to a rule-based, expert system approach that uses a text categorization shell built by Carnegie Group. Although our automated learning approach still gives a lower accuracy, by appropriately incorporating a set of manually chosen words to use as features, the combined, semi-automated approach yields accuracy close to the rule-based approach.Keywords
This publication has 6 references indexed in Scilit:
- Context-sensitive learning methods for text categorizationPublished by Association for Computing Machinery (ACM) ,1996
- Training algorithms for linear text classifiersPublished by Association for Computing Machinery (ACM) ,1996
- A comparison of classifiers and document representations for the routing problemPublished by Association for Computing Machinery (ACM) ,1995
- Automated learning of decision rules for text categorizationACM Transactions on Information Systems, 1994
- Classifying news stories using memory based reasoningPublished by Association for Computing Machinery (ACM) ,1992
- The perceptron: A probabilistic model for information storage and organization in the brain.Psychological Review, 1958