Regression trees for predicting mortality in patients with cardiovascular disease: What improvement is achieved by using ensemble‐based methods?
Open Access
- 6 July 2012
- journal article
- research article
- Published by Wiley in Biometrical Journal
- Vol. 54 (5) , 657-673
- https://doi.org/10.1002/bimj.201100251
Abstract
In biomedical research, the logistic regression model is the most commonly used method for predicting the probability of a binary outcome. While many clinical researchers have expressed an enthusiasm for regression trees, this method may have limited accuracy for predicting health outcomes. We aimed to evaluate the improvement that is achieved by using ensemble-based methods, including bootstrap aggregation (bagging) of regression trees, random forests, and boosted regression trees. We analyzed 30-day mortality in two large cohorts of patients hospitalized with either acute myocardial infarction (N = 16,230) or congestive heart failure (N = 15,848) in two distinct eras (1999–2001 and 2004–2005). We found that both the in-sample and out-of-sample prediction of ensemble methods offered substantial improvement in predicting cardiovascular mortality compared to conventional regression trees. However, conventional logistic regression models that incorporated restricted cubic smoothing splines had even better performance. We conclude that ensemble methods from the data mining and machine learning literature increase the predictive performance of regression trees, but may not lead to clear advantages over conventional logistic regression models for predicting short-term mortality in population-based samples of subjects with cardiovascular disease.Keywords
This publication has 18 references indexed in Scilit:
- TIMI, GRACE and alternative risk scores in Acute Coronary Syndromes: A meta-analysis of 40 derivation studies on 216,552 patients and of 42 validation studies on 31,625 patientsPublished by Elsevier ,2012
- Boosting Algorithms: Regularization, Prediction and Model FittingStatistical Science, 2007
- A comparison of regression trees, logistic regression, generalized additive models, and multivariate adaptive regression splines for predicting AMI mortalityStatistics in Medicine, 2006
- Predicting Mortality Among Patients Hospitalized for Heart FailureJAMA, 2003
- Predictors of Hospital Mortality in the Global Registry of Acute Coronary EventsArchives of internal medicine (1960), 2003
- Regression Modeling StrategiesPublished by Springer Nature ,2001
- Random ForestsMachine Learning, 2001
- Additive logistic regression: a statistical view of boosting (With discussion and a rejoinder by the authors)The Annals of Statistics, 2000
- A comparison of statistical learning methods on the GUSTO databaseStatistics in Medicine, 1998
- An Introduction to the BootstrapPublished by Springer Nature ,1993