Abstract
Pair-matching, frequency-matching, and stratification are used to obtain perfect exposure balance within strata defined by levels of confounders. We consider the consequences of pooling such balanced data across strata in the analysis. We examnine model-based inference for a variety of response variables and generalized linear regressions, as well as for the Cox (1972, Journal of the Royal Statistical Society, series B 34, 187-220) model. If the data are pooled so that stratum effects are omitted from the regression, certain models retain nominal size, including all Poisson models and all normal models with known variance. The model-based tests have supranominal size for all exponential models and subnomial size for all Bernoulli models and for the Cox model. These results contrast with a previous analysis of simple randomization, which only ensures exposure balance on average. With simple randomization, omission of stratum effects from these models leads to model-based significance tests with supranominal size except for the Cox and Bernoulli models, which retain nominal size. Just as for simple randomization, however, it is wise to include stratum effects in the analysis of perfectly balanced data to avoid biased estimation of exposure effects and to increase the efficiency of the comparison. Though motivated by cohort data, these results also apply to balanced case-control data under the logistic model.