A comparison of imputation techniques for handling missing predictor values in a risk model with a binary outcome
- 1 June 2007
- journal article
- research article
- Published by SAGE Publications in Statistical Methods in Medical Research
- Vol. 16 (3) , 277-298
- https://doi.org/10.1177/0962280206074466
Abstract
Risk models that aim to predict the future course and outcome of disease processes are increasingly used in health research, and it is important that they are accurate and reliable. Most of these risk models are fitted using routinely collected data in hospitals or general practices. Clinical outcomes such as short-term mortality will be near-complete, but many of the predictors may have missing values. A common approach to dealing with this is to perform a complete-case analysis. However, this may lead to overfitted models and biased estimates if entire patient subgroups are excluded. The aim of this paper is to investigate a number of methods for imputing missing data to evaluate their effect on risk model estimation and the reliability of the predictions. Multiple imputation methods, including hotdecking and multiple imputation by chained equations (MICE), were investigated along with several single imputation methods. A large national cardiac surgery database was used to create simulated yet realistic datasets. The results suggest that complete case analysis may produce unreliable risk predictions and should be avoided. Conditional mean imputation performed well in our scenario, but may not be appropriate if using variable selection methods. MICE was amongst the best performing multiple imputation methods with regards to the quality of the predictions. Additionally, it produced the least biased estimates, with good coverage, and hence is recommended for use in practice.Keywords
This publication has 27 references indexed in Scilit:
- Generic, Simple Risk Stratification Model for Heart Valve SurgeryCirculation, 2005
- Predicting the risk of repetition after self harm: cohort studyBMJ, 2005
- Cardiac surgery risk modeling for mortality: a review of current practice and suggestions for improvementThe Annals of Thoracic Surgery, 2004
- Developing a prognostic model in the presence of missing dataJournal of Clinical Epidemiology, 2003
- Regression Modeling StrategiesPublished by Springer Nature ,2001
- What do we mean by validating a prognostic model?Statistics in Medicine, 2000
- European system for cardiac operative risk evaluation (EuroSCORE)Published by Oxford University Press (OUP) ,1999
- Prediction of Coronary Heart Disease Using Risk Factor CategoriesCirculation, 1998
- The Society of Thoracic Surgeons National Cardiac Surgery Database: Current Risk AssessmentThe Annals of Thoracic Surgery, 1997
- The APACHE III Prognostic SystemChest, 1991