Thin Versus Thick Matching in the Mantel-Haenszel Procedure for Detecting DIF

Abstract
This Monte Carlo study examined strategies for forming the matching variable for the Mantel-Haenszel (MH) differential item functioning (DIF) procedure; thin matching on total test score was compared to forms of thick matching, pooling levels of the matching variable. Data were generated using a three-parameter logistic (3PL) item response theory (IRT) model with common guessing parameter. Number of subjects and test length were manipulated, as were the difficulty, discrimination, and presence/absence of DIF in the studied item. Outcome measures were the transformed log-odds &Deltacirc; MH, its standard error, and the MH chi-square statistic. For short tests (5 or 10 items), thin matching yielded very poor results, with a tendency to falsely identify items as possessing DIF against the reference group. The best methods of thick matching yielded outcome measure values closer to the expected value for non-DIF items, as well as a larger value than thin matching when the studied item possessed DIF. Intermediate length tests yielded similar results for thin matching and the best methods of thick matching. The method of thick matching that performed best depended on the measure used to detect DIF. Both difficulty and discrimination of the studied item were found to have a strong effect on the value of &Deltacirc; MH.