Investigating the Effect of Sampling Methods for Imbalanced Data Distributions
- 1 October 2006
- conference paper
- Published by Institute of Electrical and Electronics Engineers (IEEE)
- Vol. 5, 4163-4168
- https://doi.org/10.1109/icsmc.2006.384787
Abstract
Classification is an important and well-known technique in the field of machine learning, and the training data will significantly influence the classification accuracy. However, the training data in real-world applications often are imbalanced class distribution. It is important to select the suitable training data for classification in the imbalanced class distribution problem. In this paper, we propose a cluster-based sampling approach for selecting the representative data as training data to improve the classification accuracy and investigate the effect of under-sampling methods in the imbalanced class distribution problem. In the experiments, we evaluate the performances for our cluster-based sampling approach and the other sampling methods in the previous studies.Keywords
This publication has 6 references indexed in Scilit:
- Class imbalances versus small disjunctsACM SIGKDD Explorations Newsletter, 2004
- Gaining insights into support vector machine pattern classifiers using projection-based tour methodsPublished by Association for Computing Machinery (ACM) ,2001
- Committee-Based Sample Selection for Probabilistic ClassifiersJournal of Artificial Intelligence Research, 1999
- Selective Sampling Using the Query by Committee AlgorithmMachine Learning, 1997
- The CN2 induction algorithmMachine Learning, 1989
- Neural networks and artificial intelligencePublished by Association for Computing Machinery (ACM) ,1989