Optimal partitioning for classification and regression trees
- 1 April 1991
- journal article
- research article
- Published by Institute of Electrical and Electronics Engineers (IEEE)
- Vol. 13 (4) , 340-354
- https://doi.org/10.1109/34.88569
Abstract
In designing a decision tree for classification or regression, one selects at each node a feature to be tested, and partitions the range of that feature into a small number of bins, each bin corresponding to a child of the node. When the feature's range is discrete with N unordered outcomes, the optimal partition, that is, the partition minimizing an expected loss, is usually found by an exhaustive search through all possible partitions. Since the number of possible partitions grows exponentially in N, this approach is impractical when N is larger than about 10 or 20. In this paper, we present an iterative algorithm that finds a locally optimal partition for an arbitrary loss function, in time linear in N for each iteration. The algorithm is a K-means like clustering algorithm that uses as its distance measure a generalization of Kullback's information divergence. Moreover, we prove that the globally optimal partition must satisfy a nearest neighbor condition using divergence as the distance measure. These results generalize similar results of Breiman et al. to an arbitrary number of classes or regression variables and to an arbitrary number of bins. We also provide experimental results on a text-to-speech example, and we suggest additional applications of the algorithm, including the design of variable combinations, surrogate splits, composite nodes, and decision graphs.Keywords
This publication has 45 references indexed in Scilit:
- SmcHD1, containing a structural-maintenance-of-chromosomes hinge domain, has a critical role in X inactivationNature Genetics, 2008
- Complexity of strings in the class of Markov sourcesIEEE Transactions on Information Theory, 1986
- Least squares quantization in PCMIEEE Transactions on Information Theory, 1982
- Rate-distortion speech coding with a minimum discrimination information distortion measureIEEE Transactions on Information Theory, 1981
- Speech coding based upon vector quantizationIEEE Transactions on Acoustics, Speech, and Signal Processing, 1980
- Regression and ANOVA with Zero-One Data: Measures of Residual VariationJournal of the American Statistical Association, 1978
- Efficient decision tree design for discrete variable pattern recognition problemsPattern Recognition, 1977
- Conversion of limited-entry decision tables to computer programs—a proposed modification to Pollack's algorithmCommunications of the ACM, 1971
- Conversion of Limited-Entry Decision Tables to Optimal Computer Programs IIJournal of the ACM, 1967
- Problems in the Analysis of Survey Data, and a ProposalJournal of the American Statistical Association, 1963