The relative importance of persons, items, subtests and languages to TOEFL test variance

Open Access

1 April 1999

journal article
research article
Published by SAGE Publications in Language Testing

Vol. 16 (2) , 217-238
https://doi.org/10.1177/026553229901600205

Abstract

The purpose of this project was to explore the relative contributions to TOEFL score dependability (which is analogous to classical theory reliability) of various numbers of persons, items, subtests, languages and their various interactions. To these ends, three research questions were formulated: (1) What are the characteristics of the distributions, and how high are the classical theory reliability estimates for the whole test and its subtests? (2) For each of the 15 languages, what are the relative contributions to test variance of persons, items, subtests and their interactions? (3) Across all 15 languages, what are the relative contributions to test variance of persons, items, subtests and languages, as well as their various interactions? The study sampled 15 000 test takers, 1000 each from 15 different language backgrounds, from the total of 24 500 participants in the TOEFL generic data set which itself was a sample from the May 1991 worldwide administration of the TOEFL. The test was administered under normal operational conditions and included all three subtests: (1) Listening Comprehension, (2) Structure and Written Expression, and (3) Vocabulary and Reading Comprehension. The analyses included descriptive statistics, classical theory reliability estimates, and a series of generalizability studies conducted to isolate the variance components due to persons, items, subtests and languages, and their effects on the dependability of the test. Unlike previous research, the results here indicate that, when considered in concert with other important sources of variance (persons, items and subtests), language differences alone account for only a very small proportion of TOEFL test variance. These results should prove useful to test developers and researchers interested in the relative effects of such factors on test design.

Keywords

This publication has 17 references indexed in Scilit:

Investigating variability in tasks and rater judgements in a performance test of foreign language speaking
Language Testing, 1995
An investigation of a criterion- referenced test using G-theory, and factor and cluster analyses
Language Testing, 1992
Short-cut estimators of criterion-referenced test consistency
Language Testing, 1990
DIF in Native Language and Gender Groups in an ESL Placement Test
TESOL Quarterly, 1990
Linguistic and cultural bias in language proficiency tests
Language Testing, 1985
A CATEGORICAL INSTRUMENT FOR SCORING SECOND LANGUAGE WRITING SKILLS
Language Learning, 1984
AN INTRODUCTION TO GENERALIZABILITY THEORY IN SECOND LANGUAGE RESEARCH¹
Language Learning, 1982
Measures of Language Proficiency from the Learner's Perspective
TESOL Quarterly, 1982
The Evaluation of Differences in Test Performance of Two or More Groups
Educational and Psychological Measurement, 1974
THEORY OF GENERALIZABILITY: A LIBERALIZATION OF RELIABILITY THEORY†
British Journal of Statistical Psychology, 1963