Multiple Imputation of Industry and Occupation Codes in Census Public-Use Samples Using Bayesian Logistic Regression

Abstract
We describe methods used to create a new Census data base that can be used to study comparability of industry and occupation classification systems. This project represents the most extensive application of multiple imputation to date, and the modeling effort was considerable as well—hundreds of logistic regressions were estimated. One goal of this article is to summarize the strategies used in the project so that researchers can better understand how the new data bases were created. Another goal is to show how modifications of maximum likelihood methods were made for the modeling and imputation phases of the project. To multiply-impute 1980 census-comparable codes for industries and occupations in two 1970 census public-use samples, logistic regression models were estimated with flattening constants. For many of the regression models considered, the data were too sparse to support conventional maximum likelihood analysis, so some alternative had to be employed. These methods solve existence and ...

This publication has 0 references indexed in Scilit: