Irt Versus Conventional Equating Methods: A Comparative Study of Scale Stability
- 1 June 1983
- journal article
- Published by American Educational Research Association (AERA) in Journal of Educational Statistics
- Vol. 8 (2) , 137-156
- https://doi.org/10.3102/10769986008002137
Abstract
Scale drift for the verbal and mathematical portions of the Scholastic Aptitude Test (SAT) was investigated using linear, equipercentile and item response theory (IRT) equating methods. The linear methods investigated were the Tucker, Levine Equally Reliable and Levine Unequally Reliable models. Three IRT calibration designs were employed. These designs are referred to as (1) concurrent, (2) fixed b’s method, and (3) characteristic curve transformation method. The results of the various equating methods were compared both graphically and analytically. These results indicated that for reasonably parallel tests, linear equating methods perform adequately. However, when tests differ somewhat in content and length, methods based on the three-parameter logistic IRT model lead to greater stability of equating results. Of the conventional equating methods investigated, the Levine Equally Reliable model appears to be the most robust for the type of equating situation used in this study. The IRT method that provided the most stable equating results overall was the concurrent calibration method.Keywords
This publication has 5 references indexed in Scilit:
- DEVELOPING A COMMON METRIC IN ITEM RESPONSE THEORYETS Research Report Series, 1982
- Automated Hypothesis Tests and Standard Errors for Nonstandard ProblemsThe American Statistician, 1975
- Estimation of Latent Ability and Item Parameters when there are Omitted ResponsesPsychometrika, 1974
- EQUATING THE SCORE SCALES OF ALTERNATE FORMS ADMINISTERED TO SAMPLES OF DIFFERENT ABILITYETS Research Bulletin Series, 1955
- Test Reliability and Effective Test LengthPsychometrika, 1953