Feature selection for the naive bayesian classifier using decision trees

1 May 2003

journal article
research article
Published by Taylor & Francis in Applied Artificial Intelligence

Vol. 17 (5-6) , 475-487
https://doi.org/10.1080/713827175

Abstract

It is known that Naive Bayesian classifier (NB) works very well on some domains, and poorly on others. The performance of NB suffers in domains that involve correlated features. C4.5 decision trees, on the other hand, typically perform better than the Naive Bayesian algorithm on such domains. This paper describes a Selective Bayesian classifier (SBC) that simply uses only those features that C4.5 would use in its decision tree when learning a small example of a training set, a combination of the two different natures of classifiers. Experiments conducted on ten data sets indicate that SBC performs markedly better than NB on all domains, and SBC outperforms C4.5 on many data sets of which C4.5 outperform NB. Augmented Bayesian classifier (ABC) is also tested on the same data, and SBC appears to perform as well as ABC. SBC also can eliminate, in most cases, more than half of the original attributes, which can greatly reduce the size of the training and test data as well as the running time. Further, the SBC algorithm typically learns faster than both C4.5 and NB, needing fewer training examples to reach a high accuracy of classifications.

Keywords

This publication has 0 references indexed in Scilit: