Missing Data in Regression Analysis

1 January 1968

journal article
research article
Published by Oxford University Press (OUP) in Journal of the Royal Statistical Society Series B: Statistical Methodology

Vol. 30 (1) , 67-82
https://doi.org/10.1111/j.2517-6161.1968.tb01507.x

Abstract

Summary: Two alternative methods for dealing with the problem of missing observations in regression analysis are investigated. One is to discard all incomplete observations and to apply the ordinary least-squares technique only to the complete observations. The alternative is to compute the covariances between all pairs of variables, each time using only the observations having values of both variables, and to apply these covariances in constructing the system of normal equations. The former is shown to be equivalent to the Fisher–Yates method of assigning “neutral” values to missing entries in experimental design. The investigation is carried out by means of simulation. Eight sets of regression data were generated, differing from each other with respect to important factors. Various deletion patterns are applied to these regression data. The estimates resulting from applying the two alternative methods to the data with missing entries are compared with the known regression equations. In almost all the cases which were investigated the former method (ordinary least squares applied only to the complete observations) is judged superior. However, when the proportion of incomplete observations is high or when the pattern of the missing entries is highly non-random, it seems plausible that one of the many methods of assigning values to the missing entries should be applied.

Keywords

This publication has 12 references indexed in Scilit:

A Capital Intensive Approach to the Small Sample Properties of Various Simultaneous Equation Estimators
Econometrica, 1965
Linear Regression Analysis with Missing Observations among the Independent Variables
Journal of the American Statistical Association, 1964
Maximum Likelihood Estimation with Incomplete Multivariate Data
The Annals of Mathematical Statistics, 1964
A Note on a Problem Caused by Assignment of Missing Data in Sample Surveys
Econometrica, 1963
Estimation of Parameters from Incomplete Multivariate Samples
Journal of the American Statistical Association, 1957
Use of Dummy Variables in Regression Equations
Journal of the American Statistical Association, 1957
Maximum Likelihood Estimates for a Multivariate Normal Distribution when Some Observations are Missing
Journal of the American Statistical Association, 1957
Multiple Regression with Missing Observations Among the Independent Variables
Journal of the American Statistical Association, 1956
Estimation of Parameters from Incomplete Data
Journal of the American Statistical Association, 1955
Moments and Distributions of Estimates of Population Parameters from Fragmentary Samples
The Annals of Mathematical Statistics, 1932