A multistrategy approach for digital text categorization from imbalanced documents

1 June 2004

journal article
Published by Association for Computing Machinery (ACM) in ACM SIGKDD Explorations Newsletter

Vol. 6 (1) , 70-79
https://doi.org/10.1145/1007730.1007740

Abstract

The goal of the research described here is to develop a multistrategy classifier system that can be used for document categorization. The system automatically discovers classification patterns by applying several empirical learning methods to different representations for preclassified documents belonging to an imbalanced sample. The learners work in a parallel manner, where each learner carries out its own feature selection based on evolutionary techniques and then obtains a classification model. In classifying documents, the system combines the predictions of the learners by applying evolutionary techniques as well. The system relies on a modular, flexible architecture that makes no assumptions about the design of learners or the number of learners available and guarantees the independence of the thematic domain.

Keywords

This publication has 7 references indexed in Scilit:

Learning to Match the Schemas of Data Sources: A Multistrategy Approach
Machine Learning, 2003
Machine learning in automated text categorization
ACM Computing Surveys, 2002
Learning to construct knowledge bases from the World Wide Web
Artificial Intelligence, 2000
Inductive learning algorithms and representations for text categorization
Published by Association for Computing Machinery (ACM) ,1998
Feature subset selection using a genetic algorithm
IEEE Intelligent Systems and their Applications, 1998
Feature selection and feature extraction for text categorization
Published by Association for Computational Linguistics (ACL) ,1992
An algorithm for suffix stripping
Program: electronic library and information systems, 1980