Pre‐natal blood lead levels and learning difficulties in children: An analysis of non‐randomly missing categorical data

Abstract
This paper presents an analysis of categorical variables subject to non‐response. We incorporate the incomplete data into the analysis by modelling the distribution of the variables of interest and the non‐response mechanism. We discuss issues of model selection and interpretation and the effect of discarding incomplete observations. In addition, we describe how to perform all of the computations with standard statistical software. We discuss the problem of incomplete categorical data within the context of a study of the effect of lead exposure on learning difficulties in children. In this study, many of the children are not observed on some of the variables of interest. It is particularly important in this study to incorporate the incomplete data, since there is evidence that non‐response is related to the variables of interest. We reach different conclusions when we incorporate the incomplete data into the analysis than we reach when we discard the incomplete data. We also examine the sensitivity of our conclusions to the choice of a model for the non‐response mechanism.