Regression Analysis with Missing Covariate Data Using Estimating Equations

Abstract
In regression analysis, missing covariate data has been among the most common problems. Frequently, practitioners adopt the so-called complete-case analysis, i.e., performing the analysis on only a complete dataset after excluding records with missing covariates. Performing a complete-case analysis is convenient with existing statistical packages, but it may be inefficient since the observed outcomes and covariates on those records with missing covariates are not used. It can even give misleading statistical inference if missing is not completely at random. This paper introduces a joint estimating equation (JEE) for regression analysis in the presence of missing observations on one covariate, which may be thought of as a method in a general framework for the missing covariate data problem proposed by Robins, Rotnitzky, and Zhao (1994, Journal of the American Statistical Association 89, 846-866). A generalization of JEE to more than one such covariate is discussed. The JEE is generally applicable to estimating regression coefficients from a regression model, including linear and logistic regression. Provided that the missing covariate data is either missing completely at random or missing at random (in addition to mild regularity conditions), estimates of regression coefficients from the JEE are consistent and have an asymptotic normal distribution. Simulation results show that the asymptotic distribution of estimated coefficients performs well in finite samples. Also shown through the simulation study is that the validity of JEE estimates depends on the correct specification of the probability function that characterizes the missing mechanism, suggesting a need for further research on how to robustify the estimation from making this nuisance assumption. Finally, the JEE is illustrated with an application from a case-control study of diet and thyroid cancer.

This publication has 0 references indexed in Scilit: