Construct robust rule sets for classification
- 23 July 2002
- proceedings article
- Published by Association for Computing Machinery (ACM)
- p. 564-569
- https://doi.org/10.1145/775047.775130
Abstract
We study the problem of computing classification rule sets from relational databases so that accurate predictions can be made on test data with missing attribute values. Traditional classifiers perform badly when test data are not as complete as the training data because they tailor a training database too much. We introduce the concept of one rule set being more robust than another, that is, able to make more accurate predictions on test data with missing attribute values. We show that the optimal class association rule set is as robust as the complete class association rule set. We then introduce the k-optimal rule set, which provides predictions exactly the same as the optimal class association rule set on test data with up to k missing attribute values. This leads to a hierarchy of k-optimal rule sets in which decreasing size corresponds to decreasing robustness, and they all more robust than a traditional classification rule set. We introduce two methods to find k-optimal rule sets, i.e. an optimal association rule mining approach and a heuristic approximate approach. We show experimentally that a k-optimal rule set generated by the optimal association rule mining approach performs better than that by the heuristic approximate approach and both rule sets perform significantly better than a typical classification rule set (C4.5Rules) on incomplete test data.Keywords
This publication has 7 references indexed in Scilit:
- Extending naïve Bayes classifiers using long itemsetsPublished by Association for Computing Machinery (ACM) ,1999
- Mining association rules with multiple minimum supportsPublished by Association for Computing Machinery (ACM) ,1999
- A Decision-Theoretic Generalization of On-Line Learning and an Application to BoostingJournal of Computer and System Sciences, 1997
- Bagging PredictorsMachine Learning, 1996
- Mining association rules between sets of items in large databasesPublished by Association for Computing Machinery (ACM) ,1993
- Rule induction with CN2: Some recent improvementsPublished by Springer Nature ,1991
- An Empirical Comparison of Selection Measures for Decision-Tree InductionMachine Learning, 1989