On feature distributional clustering for text categorization
- 1 September 2001
- proceedings article
- Published by Association for Computing Machinery (ACM)
- p. 146-153
- https://doi.org/10.1145/383952.383976
Abstract
We describe a text categorization approach that is based on a combination of feature distributional clusters with a support vector machine (SVM) classifier. Our feature selection approach employs distributional clustering of words via the recently introducedinformation bottleneck method, which generates a more efficientword-clusterrepresentation of documents. Combined with the classification power of an SVM, this method yields high performance text categorization that can outperform other recent methods in terms of categorization accuracy and representation efficiency. Comparing the accuracy of our method with other techniques, we observe significant dependency of the results on the data set. We discuss the potential reasons for this dependency.Keywords
This publication has 9 references indexed in Scilit:
- Elements of Information TheoryPublished by Wiley ,2001
- BoosTexter: A Boosting-based System for Text CategorizationMachine Learning, 2000
- Inductive learning algorithms and representations for text categorizationPublished by Association for Computing Machinery (ACM) ,1998
- Deterministic annealing for clustering, compression, classification, regression, and related optimization problemsProceedings of the IEEE, 1998
- Distributional clustering of words for text classificationPublished by Association for Computing Machinery (ACM) ,1998
- The Nature of Statistical Learning TheoryPublished by Springer Nature ,1995
- Support-Vector NetworksMachine Learning, 1995
- Distributional clustering of English wordsPublished by Association for Computational Linguistics (ACL) ,1993
- Joining statistics with NLP for text categorizationPublished by Association for Computational Linguistics (ACL) ,1992