Efficient C4.5 [classification algorithm]

Top Cited Papers

7 August 2002

journal article
Published by Institute of Electrical and Electronics Engineers (IEEE) in IEEE Transactions on Knowledge and Data Engineering

Vol. 14 (2) , 438-444
https://doi.org/10.1109/69.991727

Abstract

We present an analytic evaluation of the runtime behavior of the C4.5 algorithm which highlights some efficiency improvements. Based on the analytic evaluation, we have implemented a more efficient version of the algorithm, called EC4.5. It improves on C4.5 by adopting the best among three strategies for computing the information gain of continuous attributes. All the strategies adopt a binary search of the threshold in the whole training set starting from the local threshold computed at a node. The first strategy computes the local threshold using the algorithm of C4.5, which, in particular, sorts cases by means of the quicksort method. The second strategy also uses the algorithm of C4.5, but adopts a counting sort method. The third strategy calculates the local threshold using a main-memory version of the RainForest algorithm, which does not need sorting. Our implementation computes the same decision trees as C4.5 with a performance gain of up to five times.

Keywords

This publication has 11 references indexed in Scilit:

ScalParC: a new scalable and efficient parallel classification algorithm for mining large datasets
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2002
RainForest—A Framework for Fast Decision Tree Construction of Large Datasets
Data Mining and Knowledge Discovery, 2000
A Comparison of Prediction Accuracy, Complexity, and Training Time of Thirty-Three Old and New Classification Algorithms
Machine Learning, 2000
PUBLIC: A Decision Tree Classifier that Integrates Building and Pruning
Data Mining and Knowledge Discovery, 2000
BOAT—optimistic decision tree construction
Published by Association for Computing Machinery (ACM) ,1999
General and Efficient Multisplitting of Numerical Attributes
Machine Learning, 1999
Parallel Formulations of Decision-Tree Classification Algorithms
Data Mining and Knowledge Discovery, 1999
Use of contextual information for feature ranking and discretization
IEEE Transactions on Knowledge and Data Engineering, 1997
On the handling of continuous-valued attributes in decision tree generation
Machine Learning, 1992
Induction of decision trees
Machine Learning, 1986