Sensitivity of Equating Results to Different Sampling Strategies

1 January 1990

journal article
Published by Taylor & Francis in Applied Measurement in Education

Vol. 3 (1) , 53-71
https://doi.org/10.1207/s15324818ame0301_5

Abstract

In this article, the results of equating two parallel forms of the College Board Biology Achievement Test using three different sampling strategies are discussed. New-form data were collected during a fall administration of the test, and old-form data were collected at a spring administration. The group taking the test in the spring was much more able, as measured by test score, than the group taking the test in the fall. The three sampling strategies studied were representative sampling, matched sampling, and reference or target sampling. For each sampling strategy, five equating procedures were studied: Tucker and Levine unequally reliable linear equatings, frequency estimation equipercentile and chained equipercentile curvilinear equatings, and three-parameter logistic (3PL) item response theory (IRT) true-score equating. The criterion for comparison in all cases was the results of a Tucker linear equating from a fall new-form/fall old-form representative sampling data collection design. Results of this study indicated that matching on a set of common items provided greater agreement among the results of the various equating procedures studied than were obtained under representative sampling. In addition, for all equating procedures, the results of equating with samples matched on common item scores agreed more closely with the criterion equating than did the equating results from representative samples. Matching to an external target population produced agreement among methods, but did not agree as closely with the criterion equating as matching to the new form on the basis of common item scores. The equating models least affected by differences in new-form and old-form sample abilities were the Tucker and frequency estimation equipercentile models and the procedure most affected by ability differences was the 3PL IRT procedure.

Keywords

This publication has 7 references indexed in Scilit:

EQUATING ACHIEVEMENT TESTS USING SAMPLES MATCHED ON ABILITY
ETS Research Report Series, 1990
Effect on Equating Results of Matching Samples on an Anchor Test
Applied Measurement in Education, 1990
Equating Methods and Sampling Designs
Applied Measurement in Education, 1990
A Comparative Study of the Effects of Recency of Instruction on the Stability of IRT and Conventional Item Parameter Estimates
Journal of Educational Measurement, 1988
The Use of Presnloothing and Postsnloothing to Increase the Precision of Equipercentile Equating
Applied Psychological Measurement, 1987
Smoothing the joint and marginal distributions of scored two‐way contingency tables in test equating
British Journal of Mathematical and Statistical Psychology, 1987
PRACTICAL APPLICATIONS OF ITEM CHARACTERISTIC CURVE THEORY*
Journal of Educational Measurement, 1977