Avoiding ‘Data Snooping’ in Multilevel and Mixed Effects Models

Abstract

Summary: Multilevel or mixed effects models are commonly applied to hierarchical data. The level 2 residuals, which are otherwise known as random effects, are often of both substantive and diagnostic interest. Substantively, they are frequently used for institutional comparisons or rankings. Diagnostically, they are used to assess the model assumptions at the group level. Inference on the level 2 residuals, however, typically does not account for ‘data snooping’, i.e. for the harmful effects of carrying out a multitude of hypothesis tests at the same time. We provide a very general framework that encompasses both of the following inference problems: inference on the ‘absolute’ level 2 residuals to determine which are significantly different from 0, and inference on any prespecified number of pairwise comparisons. Thus, the user has the choice of testing the comparisons of interest. As our methods are flexible with respect to the estimation method that is invoked, the user may choose the desired estimation method accordingly. We demonstrate the methods with the London education authority data, the wafer data and the National Educational Longitudinal Study data.

Keywords

All Related Versions

Version 1, 2005-01-01, RePEc (Unconfirmed version)

This publication has 23 references indexed in Scilit:

Control of generalized error rates in multiple testing
The Annals of Statistics, 2007
Stepwise Multiple Testing as Formalized Data Snooping
Econometrica, 2005
Generalizations of the familywise error rate
The Annals of Statistics, 2005
An Application of Multilevel Model Prediction to NELS:88
Behaviormetrika, 2004
The control of the false discovery rate in multiple testing under dependency
The Annals of Statistics, 2001
Controlling Error in Multiple Comparisons, with Examples from State-to-State Differences in Educational Achievement
Journal of Educational and Behavioral Statistics, 1999
Questioning Multilevel Models
Journal of Educational and Behavioral Statistics, 1995
A fast scoring algorithm for maximum likelihood estimation in unbalanced mixed models with nested random effects
Biometrika, 1987
Random Coefficient Models for Multilevel Analysis
Journal of Educational Statistics, 1986