Induction of Decision Trees via Evolutionary Programming
- 20 March 2004
- journal article
- Published by American Chemical Society (ACS) in Journal of Chemical Information and Computer Sciences
- Vol. 44 (3) , 862-870
- https://doi.org/10.1021/ci034188s
Abstract
Decision trees have been used extensively in cheminformatics for modeling various biochemical endpoints including receptor-ligand binding, ADME properties, environmental impact, and toxicity. The traditional approach to inducing decision trees based upon a given training set of data involves recursive partitioning which selects partitioning variables and their values in a greedy manner to optimize a given measure of purity. This methodology has numerous benefits including classifier interpretability and the capability of modeling nonlinear relationships. The greedy nature of induction, however, may fail to elucidate underlying relationships between the data and endpoints. Using evolutionary programming, decision trees are induced which are significantly more accurate than trees induced by recursive partitioning. Furthermore, when assessed on previously unseen data in a 10-fold cross-validated manner, evolutionary programming induced trees exhibit a significantly higher accuracy on previously unseen data. This methodology is compared to single-tree and multiple-tree recursive partitioning in two domains (aerobic biodegradability and hepatotoxicity) and shown to produce less complex classifiers with average increases in predictive accuracy of 5-10% over the traditional method.Keywords
This publication has 19 references indexed in Scilit:
- Random Forest: A Classification and Regression Tool for Compound Classification and QSAR ModelingJournal of Chemical Information and Computer Sciences, 2003
- Use of Robust Classification Techniques for the Prediction of Human Cytochrome P450 2D6 InhibitionJournal of Chemical Information and Computer Sciences, 2003
- Feature selection for the naive bayesian classifier using decision treesApplied Artificial Intelligence, 2003
- Decision Forest: Combining the Predictions of Multiple Independent Decision Tree ModelsJournal of Chemical Information and Computer Sciences, 2003
- Prediction of biodegradability from chemical structure: Modeling of ready biodegradation test dataEnvironmental Toxicology and Chemistry, 1999
- Neural Network Studies. 2. Variable SelectionJournal of Chemical Information and Computer Sciences, 1996
- Phase I trial of ilmofosine as a 24 hour infusion weeklyInvestigational New Drugs, 1995
- Neural network studies. 1. Comparison of overfitting and overtrainingJournal of Chemical Information and Computer Sciences, 1995
- Effects of Tetrahydroaminoacridine on Liver Function in Patients with Alzheimer's DiseaseAge and Ageing, 1991
- AUTOMATIC INTERACTION DETECTIONPublished by Cambridge University Press (CUP) ,1982