Evaluation of the Efficiency of Item Calibration

Abstract
This study compared several IRT calibration proce dures to determine which procedure, if any, consis tently produced the most accurate item parameter esti mates. A new criterion of calibration efficiency was used for evaluating the calibration procedures; this cri terion considers the joint effects of individual item pa rameter errors as they relate to the accuracy of θ esti mation. Four methods of item calibration were evaluated: (1) heuristic estimates obtained from trans formations of traditional item statistics; (2) ANCILLES, a program that first fits the c parameter and then trans forms traditional item statistics to IRT a and b parame ters ; (3) LOGIST, a joint maximum likelihood proce dure ; and (4) ASCAL, a modification of LOGIST'S algorithm which applies Bayesian priors to the abilities and item parameters. These were compared with each other and with a constant item parameter baseline con dition. ASCAL and LOGIST produced estimates of essen tially equivalent accuracy, although ASCAL's estimates of the c parameters were slightly superior. The heuris tic estimates and those from ANCILLES were generally poor in comparison, particularly for smaller sample sizes. Index terms: Calibration efficiency, Item calibration, Item parameter estimation, Item response theory, Latent trait models.