Accounting for Statistical Artifacts in Item Bias Research

1 June 1984

journal article
Published by American Educational Research Association (AERA) in Journal of Educational Statistics

Vol. 9 (2) , 93-128
https://doi.org/10.3102/10769986009002093

Abstract

Theoretically preferred IRT bias detection procedures were applied to both a mathematics achievement and vocabulary test. The data were from black and white seniors on the High School and Beyond data files. To account for statistical artifacts, each analysis was repeated on randomly equivalent samples of blacks and whites ( n’s = 1,500). Furthermore, to establish a baseline for judging bias indices that might be attributable only to sampling fluctuations, bias analyses were conducted comparing randomly selected groups of whites. To assess the effect of mean group differences on the appearance of bias, pseudo-ethnic groups were created, that is, samples of whites were selected to simulate the average black-white difference. The validity and sensitivity of the IRT bias indices was supported by several findings. A relatively large number of items (10 of 29) on the math test were found to be consistently biased; they were replicated in parallel analyses. The bias indices were substantially smaller in white-white analyses. Furthermore, the indices (with the possible exception of χ²) did not find bias in the pseudo-ethnic comparison. The pattern of between-study correlations showed high consistency for parallel ethnic analyses where bias was plausibly present. Also, the indices met the discriminant validity test—the correlations were low between conditions where bias should not be present. For the math test, where a substantial number of items appeared biased, the results were interpretable. Verbal math problems were systematically more difficult for blacks. Overall, the sums-of-squares statistics (weighted by the inverse of the variance errors) were judged to be the best indices for quantifying ICC differences between groups. Not only were these statistics the most consistent in detecting bias in the ethnic comparisons, but they also intercorrelated the least in situations of no bias.

Keywords

This publication has 15 references indexed in Scilit:

Comparison of Procedures for Detecting test-Item Bias with both Internal and External Ability Criteria
Journal of Educational Statistics, 1981
Bias in testing.
American Psychologist, 1981
Item Bias in a Test of Reading Comprehension
Applied Psychological Measurement, 1981
A Comparison of a Statistical and Subjective Procedure to Ascertain Item Validity: One Step in the Test Validation Process
Educational and Psychological Measurement, 1980
Accuracy of judgments of WISC-R item difficulty for minority groups.
Journal of Consulting and Clinical Psychology, 1980
Biased Item Detection Techniques
Journal of Educational Statistics, 1980
A METHOD OF ASSESSING BIAS IN TEST ITEMS
Journal of Educational Measurement, 1979
Unifactor Latent Trait Models Applied to Multifactor Tests: Results and Implications
Journal of Educational Statistics, 1979
An examination of culture bias in the wonderlic personnel test
Intelligence, 1977
AN EVALUATION OF SOME MODELS FOR CULTURE‐FAIR SELECTION
Journal of Educational Measurement, 1976