An Exploration of the Robustness of Four Test Equating Models

Abstract
This monte carlo study explored how four com monly used test equating methods (linear, equipercen tile, and item response theory methods based on the Rasch and three-parameter models) responded to tests of different psychometric properties. The four methods were applied to generated data sets where mean item difficulty and discrimination as well as level of chance scoring were manipulated. In all cases, examinee abil ity was matched to the level of difficulty of the tests. The results showed the Rasch model not to be very robust to violations of the equal discrimination and non-chance scoring assumptions. There were also problems with the three-parameter model, but these were due primarily to estimation and linking prob lems. The recommended procedure for tests similar to those studied is the equipercentile method.