Accounting for Statistical Artifacts in Item Bias Research
- 1 June 1984
- journal article
- Published by American Educational Research Association (AERA) in Journal of Educational Statistics
- Vol. 9 (2) , 93-128
- https://doi.org/10.3102/10769986009002093
Abstract
Theoretically preferred IRT bias detection procedures were applied to both a mathematics achievement and vocabulary test. The data were from black and white seniors on the High School and Beyond data files. To account for statistical artifacts, each analysis was repeated on randomly equivalent samples of blacks and whites ( n’s = 1,500). Furthermore, to establish a baseline for judging bias indices that might be attributable only to sampling fluctuations, bias analyses were conducted comparing randomly selected groups of whites. To assess the effect of mean group differences on the appearance of bias, pseudo-ethnic groups were created, that is, samples of whites were selected to simulate the average black-white difference. The validity and sensitivity of the IRT bias indices was supported by several findings. A relatively large number of items (10 of 29) on the math test were found to be consistently biased; they were replicated in parallel analyses. The bias indices were substantially smaller in white-white analyses. Furthermore, the indices (with the possible exception of χ2) did not find bias in the pseudo-ethnic comparison. The pattern of between-study correlations showed high consistency for parallel ethnic analyses where bias was plausibly present. Also, the indices met the discriminant validity test—the correlations were low between conditions where bias should not be present. For the math test, where a substantial number of items appeared biased, the results were interpretable. Verbal math problems were systematically more difficult for blacks. Overall, the sums-of-squares statistics (weighted by the inverse of the variance errors) were judged to be the best indices for quantifying ICC differences between groups. Not only were these statistics the most consistent in detecting bias in the ethnic comparisons, but they also intercorrelated the least in situations of no bias.Keywords
This publication has 15 references indexed in Scilit:
- Comparison of Procedures for Detecting test-Item Bias with both Internal and External Ability CriteriaJournal of Educational Statistics, 1981
- Bias in testing.American Psychologist, 1981
- Item Bias in a Test of Reading ComprehensionApplied Psychological Measurement, 1981
- A Comparison of a Statistical and Subjective Procedure to Ascertain Item Validity: One Step in the Test Validation ProcessEducational and Psychological Measurement, 1980
- Accuracy of judgments of WISC-R item difficulty for minority groups.Journal of Consulting and Clinical Psychology, 1980
- Biased Item Detection TechniquesJournal of Educational Statistics, 1980
- A METHOD OF ASSESSING BIAS IN TEST ITEMSJournal of Educational Measurement, 1979
- Unifactor Latent Trait Models Applied to Multifactor Tests: Results and ImplicationsJournal of Educational Statistics, 1979
- An examination of culture bias in the wonderlic personnel testIntelligence, 1977
- AN EVALUATION OF SOME MODELS FOR CULTURE‐FAIR SELECTIONJournal of Educational Measurement, 1976