Multiple Imputation for Incomplete Data With Semicontinuous Variables

Abstract
We consider the application of multiple imputation to data containing not only partially missing categorical and continuous variables, but also partially missing 'semicontinuous' variables (variables that take on a single discrete value with positive probability but are otherwise continuously distributed). As an imputation model for data sets of this type, we introduce an extension of the standard general location model proposed by Olkin and Tate; our extension, the blocked general location model, provides a robust and general strategy for handling partially observed semicontinuous variables. In particular, we incorporate a two-level model for the semicontinuous variables into the general location model. The first level models the probability that the semicontinuous variable takes on its point mass value, and the second level models the distribution of the variable given that it is not at its point mass. In addition, we introduce EM and data augmentation algorithms for the blocked general location model with missing data; these can be used to generate imputations under the proposed model and have been implemented in publicly available software. We illustrate our model and computational methods via a simulation study and an analysis of a survey of Massachusetts Megabucks Lottery winners.

This publication has 1 reference indexed in Scilit: