Design effects for binary regression models fitted to dependent data

Abstract
Dependent data, such as arise with cluster sampling, typically yield variances of parameter estimates which are larger than would be provided by a simple random sample of the same size. This variance inflation factor is called the design effect of the estimator. Design effects have been derived for cluster sampling designs using simple estimators such as means and proportions, and also for linear regression coefficient estimators. In this paper, we show that a method to derive design effects for linear regression estimators extends to generalized linear models for binary responses. In particular, some simple expressions for design effects in the linear regression model provide accurate approximations for binary regression models such as those based on the logistic, probit and complementary log—log links. We corroborate our findings with two examples and some simulation studies.