Multiple imputation in a large-scale complex survey: a practical guide

4 August 2009

journal article
research article
Published by SAGE Publications in Statistical Methods in Medical Research

Vol. 19 (6) , 653-670
https://doi.org/10.1177/0962280208101273

Abstract

The Cancer Care Outcomes Research and Surveillance (CanCORS) Consortium is a multisite, multimode, multiwave study of the quality and patterns of care delivered to population-based cohorts of newly diagnosed patients with lung and colorectal cancer. As is typical in observational studies, missing data are a serious concern for CanCORS, following complicated patterns that impose severe challenges to the consortium investigators. Despite the popularity of multiple imputation of missing data, its acceptance and application still lag in large-scale studies with complicated data sets such as CanCORS. We use sequential regression multiple imputation, implemented in public-available software, to deal with non-response in the CanCORS surveys and construct a centralised completed database that can be easily used by investigators from multiple sites. Our work illustrates the feasibility of multiple imputation in a large-scale multiobjective survey, showing its capacity to handle complex missing data. We present the implementation process in detail as an example for practitioners and discuss some of the challenging issues which need further research.

Keywords

This publication has 22 references indexed in Scilit:

Evaluation of software for multiple imputation of semi-continuous data
Statistical Methods in Medical Research, 2007
Sensitivity analysis after multiple imputation under missing at random: a weighting approach
Statistical Methods in Medical Research, 2007
Multiple Imputation for Model Checking: Completed‐Data Plots with Missing and Latent Data
Biometrics, 2005
Imputation for incomplete high‐dimensional multivariate normal data using a common factor model
Statistics in Medicine, 2004
Understanding Cancer Treatment and Outcomes: The Cancer Care Outcomes Research and Surveillance Consortium
Journal of Clinical Oncology, 2004
Parameterization and Bayesian Modeling
Journal of the American Statistical Association, 2004
Not Asked and Not Answered: Multiple Imputation for Multiple Surveys
Journal of the American Statistical Association, 1998
On Variance Estimation with Imputed Survey Data
Journal of the American Statistical Association, 1996
Multiple Imputation after 18+ Years
Journal of the American Statistical Association, 1996
Multiple Imputation of Industry and Occupation Codes in Census Public-use Samples Using Bayesian Logistic Regression
Journal of the American Statistical Association, 1991