A comparison of imputation techniques for handling missing predictor values in a risk model with a binary outcome

1 June 2007

journal article
research article
Published by SAGE Publications in Statistical Methods in Medical Research

Vol. 16 (3) , 277-298
https://doi.org/10.1177/0962280206074466

Abstract

Risk models that aim to predict the future course and outcome of disease processes are increasingly used in health research, and it is important that they are accurate and reliable. Most of these risk models are fitted using routinely collected data in hospitals or general practices. Clinical outcomes such as short-term mortality will be near-complete, but many of the predictors may have missing values. A common approach to dealing with this is to perform a complete-case analysis. However, this may lead to overfitted models and biased estimates if entire patient subgroups are excluded. The aim of this paper is to investigate a number of methods for imputing missing data to evaluate their effect on risk model estimation and the reliability of the predictions. Multiple imputation methods, including hotdecking and multiple imputation by chained equations (MICE), were investigated along with several single imputation methods. A large national cardiac surgery database was used to create simulated yet realistic datasets. The results suggest that complete case analysis may produce unreliable risk predictions and should be avoided. Conditional mean imputation performed well in our scenario, but may not be appropriate if using variable selection methods. MICE was amongst the best performing multiple imputation methods with regards to the quality of the predictions. Additionally, it produced the least biased estimates, with good coverage, and hence is recommended for use in practice.

Keywords

This publication has 27 references indexed in Scilit:

Generic, Simple Risk Stratification Model for Heart Valve Surgery
Circulation, 2005
Predicting the risk of repetition after self harm: cohort study
BMJ, 2005
Cardiac surgery risk modeling for mortality: a review of current practice and suggestions for improvement
The Annals of Thoracic Surgery, 2004
Developing a prognostic model in the presence of missing data
Journal of Clinical Epidemiology, 2003
Regression Modeling Strategies
Published by Springer Nature ,2001
What do we mean by validating a prognostic model?
Statistics in Medicine, 2000
European system for cardiac operative risk evaluation (EuroSCORE)
Published by Oxford University Press (OUP) ,1999
Prediction of Coronary Heart Disease Using Risk Factor Categories
Circulation, 1998
The Society of Thoracic Surgeons National Cardiac Surgery Database: Current Risk Assessment
The Annals of Thoracic Surgery, 1997
The APACHE III Prognostic System
Chest, 1991