A comparative analysis of methods for pruning decision trees

1 May 1997

journal article
Published by Institute of Electrical and Electronics Engineers (IEEE)

Vol. 19 (5) , 476-493
https://doi.org/10.1109/34.589207

Abstract

In this paper, we address the problem of retrospectively pruning decision trees induced from data, according to a top-down approach. This problem has received considerable attention in the areas of pattern recognition and machine learning, and many distinct methods have been proposed in literature. We make a comparative study of six well-known pruning methods with the aim of understanding their theoretical foundations, their computational complexity, and the strengths and weaknesses of their formulation. Comments on the characteristics of each method are empirically supported. In particular, a wide experimentation performed on several data sets leads us to opposite conclusions on the predictive accuracy of simplified trees from some drawn in the literature. We attribute this divergence to differences in experimental designs. Finally, we prove and make use of a property of the reduced error pruning method to obtain an objective evaluation of the tendency to overprune/underprune observed in each method.

Keywords

This publication has 21 references indexed in Scilit:

A Further Comparison of Simplification Methods for Decision-Tree Induction
Published by Springer Nature ,1996
Automatic Parameter Selection by Minimizing Estimated Error
Published by Elsevier ,1995
Bias in information-based measures in decision tree induction
Machine Learning, 1994
Overfitting avoidance as bias
Machine Learning, 1993
Very Simple Classification Rules Perform Well on Most Commonly Used Datasets
Machine Learning, 1993
Deconstructing the Digit Recognition Problem
Published by Elsevier ,1992
An iterative growing and pruning algorithm for classification tree design
Published by Institute of Electrical and Electronics Engineers (IEEE) ,1991
An empirical comparison of selection measures for decision-tree induction
Machine Learning, 1989
Expert Systems-Rule Induction with Statistical Data
Journal of the Operational Research Society, 1987
Statistical Properties of Error Estimators in Performance Assessment of Recognition Systems
Published by Institute of Electrical and Electronics Engineers (IEEE) ,1982