A multistrategy approach for digital text categorization from imbalanced documents
- 1 June 2004
- journal article
- Published by Association for Computing Machinery (ACM) in ACM SIGKDD Explorations Newsletter
- Vol. 6 (1) , 70-79
- https://doi.org/10.1145/1007730.1007740
Abstract
The goal of the research described here is to develop a multistrategy classifier system that can be used for document categorization. The system automatically discovers classification patterns by applying several empirical learning methods to different representations for preclassified documents belonging to an imbalanced sample. The learners work in a parallel manner, where each learner carries out its own feature selection based on evolutionary techniques and then obtains a classification model. In classifying documents, the system combines the predictions of the learners by applying evolutionary techniques as well. The system relies on a modular, flexible architecture that makes no assumptions about the design of learners or the number of learners available and guarantees the independence of the thematic domain.Keywords
This publication has 7 references indexed in Scilit:
- Learning to Match the Schemas of Data Sources: A Multistrategy ApproachMachine Learning, 2003
- Machine learning in automated text categorizationACM Computing Surveys, 2002
- Learning to construct knowledge bases from the World Wide WebArtificial Intelligence, 2000
- Inductive learning algorithms and representations for text categorizationPublished by Association for Computing Machinery (ACM) ,1998
- Feature subset selection using a genetic algorithmIEEE Intelligent Systems and their Applications, 1998
- Feature selection and feature extraction for text categorizationPublished by Association for Computational Linguistics (ACL) ,1992
- An algorithm for suffix strippingProgram: electronic library and information systems, 1980