Missing Data in Regression Analysis
- 1 January 1968
- journal article
- research article
- Published by Oxford University Press (OUP) in Journal of the Royal Statistical Society Series B: Statistical Methodology
- Vol. 30 (1) , 67-82
- https://doi.org/10.1111/j.2517-6161.1968.tb01507.x
Abstract
Summary: Two alternative methods for dealing with the problem of missing observations in regression analysis are investigated. One is to discard all incomplete observations and to apply the ordinary least-squares technique only to the complete observations. The alternative is to compute the covariances between all pairs of variables, each time using only the observations having values of both variables, and to apply these covariances in constructing the system of normal equations. The former is shown to be equivalent to the Fisher–Yates method of assigning “neutral” values to missing entries in experimental design. The investigation is carried out by means of simulation. Eight sets of regression data were generated, differing from each other with respect to important factors. Various deletion patterns are applied to these regression data. The estimates resulting from applying the two alternative methods to the data with missing entries are compared with the known regression equations. In almost all the cases which were investigated the former method (ordinary least squares applied only to the complete observations) is judged superior. However, when the proportion of incomplete observations is high or when the pattern of the missing entries is highly non-random, it seems plausible that one of the many methods of assigning values to the missing entries should be applied.Keywords
This publication has 12 references indexed in Scilit:
- A Capital Intensive Approach to the Small Sample Properties of Various Simultaneous Equation EstimatorsEconometrica, 1965
- Linear Regression Analysis with Missing Observations among the Independent VariablesJournal of the American Statistical Association, 1964
- Maximum Likelihood Estimation with Incomplete Multivariate DataThe Annals of Mathematical Statistics, 1964
- A Note on a Problem Caused by Assignment of Missing Data in Sample SurveysEconometrica, 1963
- Estimation of Parameters from Incomplete Multivariate SamplesJournal of the American Statistical Association, 1957
- Use of Dummy Variables in Regression EquationsJournal of the American Statistical Association, 1957
- Maximum Likelihood Estimates for a Multivariate Normal Distribution when Some Observations are MissingJournal of the American Statistical Association, 1957
- Multiple Regression with Missing Observations Among the Independent VariablesJournal of the American Statistical Association, 1956
- Estimation of Parameters from Incomplete DataJournal of the American Statistical Association, 1955
- Moments and Distributions of Estimates of Population Parameters from Fragmentary SamplesThe Annals of Mathematical Statistics, 1932