A Comparison of Item- and Person-Fit Methods of Assessing Model-Data Fit in IRT

1 June 1990

journal article
Published by SAGE Publications in Applied Psychological Measurement

Vol. 14 (2) , 127-137
https://doi.org/10.1177/014662169001400202

Abstract

Many item-fit statistics have been proposed for assessing whether the responses to test items ag gregated across examinees conform to IRT test models. Conversely, person-fit statistics have been proposed for assessing whether an examinee's re sponses aggregated across items are congruent with a specified IRT model. Statistical procedures to as sess item fit have differed from those to assess per son fit. This research compared a χ² item-fit index with a likelihood-based person-fit index. Eight 0,1 data matrices were simulated under the three- parameter logistic test model. Both the likelihood- based and χ² fit statistics were then computed for examinees and items, and Type I and Type II error rates were analyzed. With data simulated to fit the IRT model, the χ² test overidentified examinees and items as being misfitting, while the likelihood- based fit index held closer to the specified α levels. The two fit indices gave consistent (mis)fit-to- model results in 94 and 97 percent of cases for items and examinees, respectively, across simula tions. Under simulated conditions of data misfit, the χ² statistic detected misfit at a higher rate than the likelihood-based statistic, indicating that the χ² statistic was slightly more sensitive to response pat tern aberrancy. However, other considerations led to a recommendation for employing the likelihood- based index in applied fit analyses to evaluate both examinee and item model-data (mis)fit.

Keywords

This publication has 18 references indexed in Scilit:

The Analysis of Item-Ability Regressions: An Exploratory IRT Model Fit Tool
Applied Psychological Measurement, 1985
Appropriateness measurement with polychotomous item response models and standardized indices
British Journal of Mathematical and Statistical Psychology, 1985
Likert Scaling Using the Graded Response Latent Trait Model
Applied Psychological Measurement, 1983
Choice of Test Model for Appropriateness Measurement
Applied Psychological Measurement, 1982
Appropriateness measurement: Review, critique and validating studies
British Journal of Mathematical and Statistical Psychology, 1982
ANALYSIS OF ITEM RESPONSE PATTERNS. QUESTIONABLE TEST DATA AND DISSIMILAR CURRICULUM PRACTICES
Journal of Educational Measurement, 1981
Measuring the Appropriateness of Multiple-Choice Test Scores
Journal of Educational Statistics, 1979
Tests are perfectly reliable
British Journal of Mathematical and Statistical Psychology, 1978
A Goodness of Fit Test for the Rasch Model
Psychometrika, 1973
Estimating Item Parameters and Latent Ability when Responses are Scored in Two or More Nominal Categories
Psychometrika, 1972