Parallel formulations of decision-tree classification algorithms
- 27 November 2002
- conference paper
- Published by Institute of Electrical and Electronics Engineers (IEEE)
- p. 237-244
- https://doi.org/10.1109/icpp.1998.708491
Abstract
Classification decision tree algorithms are used extensively for data mining in many domains such as retail target marketing, fraud detection, etc. Highly parallel algorithms for constructing classification decision trees are desirable for dealing with large data sets in reasonable amount of time. Algorithms for building classification decision trees have a natural concurrency, but are difficult to parallelize due to the inherent dynamic nature of the computation. We present parallel formulations of classification decision tree learning algorithm based on induction. We describe two basic parallel formulations. One is based on Synchronous Tree Construction Approach and the other is based on Partitioned Tree Construction Approach. We discuss the advantages and disadvantages of using these methods and propose a hybrid method that employs the good features of these methods. Experimental results on an IBM SP-2 demonstrate excellent speedups and scalability.Keywords
This publication has 4 references indexed in Scilit:
- ScalParC: a new scalable and efficient parallel classification algorithm for mining large datasetsPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2002
- Use of contextual information for feature ranking and discretizationIEEE Transactions on Knowledge and Data Engineering, 1997
- Database mining: a performance perspectiveIEEE Transactions on Knowledge and Data Engineering, 1993
- An introduction to computing with neural netsIEEE ASSP Magazine, 1987