Imputing missing covariate values for the Cox model
Top Cited Papers
Open Access
- 19 May 2009
- journal article
- research article
- Published by Wiley in Statistics in Medicine
- Vol. 28 (15) , 1982-1998
- https://doi.org/10.1002/sim.3618
Abstract
Multiple imputation is commonly used to impute missing data, and is typically more efficient than complete cases analysis in regression analysis when covariates have missing values. Imputation may be performed using a regression model for the incomplete covariates on other covariates and, importantly, on the outcome. With a survival outcome, it is a common practice to use the event indicator D and the log of the observed event or censoring time T in the imputation model, but the rationale is not clear.We assume that the survival outcome follows a proportional hazards model given covariates X and Z. We show that a suitable model for imputing binary or Normal X is a logistic or linear regression on the event indicator D, the cumulative baseline hazard H0(T), and the other covariates Z. This result is exact in the case of a single binary covariate; in other cases, it is approximately valid for small covariate effects and/or small cumulative incidence. If we do not know H0(T), we approximate it by the Nelson–Aalen estimator of H(T) or estimate it by Cox regression.We compare the methods using simulation studies. We find that using logT biases covariate‐outcome associations towards the null, while the new methods have lower bias. Overall, we recommend including the event indicator and the Nelson–Aalen estimator of H(T) in the imputation model. Copyright © 2009 John Wiley & Sons, Ltd.Keywords
Funding Information
- MRC (U.1052.00.006)
This publication has 16 references indexed in Scilit:
- Using the outcome for imputation of missing predictor values was preferredJournal of Clinical Epidemiology, 2006
- Imputations of Missing Values in Practice: Results from Imputations of Serum Cholesterol in 28 Cohort StudiesAmerican Journal of Epidemiology, 2004
- Developing a prognostic model in the presence of missing dataJournal of Clinical Epidemiology, 2003
- Statistical Analysis with Missing DataPublished by Wiley ,2002
- Multiple imputation of missing blood pressure covariates in survival analysisStatistics in Medicine, 1999
- Interferon-alpha and survival in metastatic renal carcinoma: early results of a randomised controlled trial. Medical Research Council Renal Cancer Collaborators.1999
- Interferon-α and survival in metastatic renal carcinoma: early results of a randomised controlled trialThe Lancet, 1999
- Large-Sample Significance Levels from Multiply Imputed Data Using Moment-Based Statistics and an F Reference DistributionJournal of the American Statistical Association, 1991
- Missing-Data Adjustments in Large SurveysJournal of Business & Economic Statistics, 1988
- Multiple Imputation for Interval Estimation from Simple Random Samples with Ignorable NonresponseJournal of the American Statistical Association, 1986