Pattern-Mixture Models for Multivariate Incomplete Data

Abstract
Consider a random sample on variables X1, …, Xv with some values of Xv missing. Selection models specify the distribution of X1 , …, XV over respondents and nonrespondents to Xv , and the conditional distribution that Xv is missing given X1 , …, Xv . In contrast, pattern-mixture models specify the conditional distribution of X 1, …, Xv given that XV is observed or missing respectively and the marginal distribution of the binary indicator for whether or not Xv is missing. For multivariate data with a general pattern of missing values, the literature has tended to adopt the selection-modeling approach (see for example Little and Rubin); here, pattern-mixture models are proposed for this more general problem. Pattern-mixture models are chronically underidentified; in particular for the case of univariate nonresponse mentioned above, there are no data on the distribution of Xv given X1 , …, XV–1 , in the stratum with Xv missing. Thus the models require restrictions or prior information to identify the parameters. Complete-case restrictions tie unidentified parameters to their (identified) analogs in the stratum of complete cases. Alternative types of restriction tie unidentified parameters to parameters in other missing-value patterns or sets of such patterns. This large set of possible identifying restrictions yields a rich class of missing-data models. Unlike ignorable selection models, which generally requires iterative methods except for special missing-data patterns, some pattern-mixture models yield explicit ML estimates for general patterns. Such models are readily amenable to Bayesian methods and form a convenient basis for multiple imputation. Some previously considered noniterative estimation methods are shown to be maximum likelihood (ML) under a pattern-mixture model. For example, Buck's method for continuous data, corrected as in Beale and Little (1975), and Brown's estimators for nonrandomly missing data are ML for pattern-mixture models with particular complete-case restrictions. Available-case analyses, where the mean and variance of Xj are computed using all cases with Xj observed and the correlation (or covariance) of Xj and Xk is computed using all cases with Xj and Xk observed, are also close to ML for another pattern-mixture model. Asymptotic theory for this class of estimators is outlined.