The relative importance of persons, items, subtests and languages to TOEFL test variance
Open Access
- 1 April 1999
- journal article
- research article
- Published by SAGE Publications in Language Testing
- Vol. 16 (2) , 217-238
- https://doi.org/10.1177/026553229901600205
Abstract
The purpose of this project was to explore the relative contributions to TOEFL score dependability (which is analogous to classical theory reliability) of various numbers of persons, items, subtests, languages and their various interactions. To these ends, three research questions were formulated: (1) What are the characteristics of the distributions, and how high are the classical theory reliability estimates for the whole test and its subtests? (2) For each of the 15 languages, what are the relative contributions to test variance of persons, items, subtests and their interactions? (3) Across all 15 languages, what are the relative contributions to test variance of persons, items, subtests and languages, as well as their various interactions? The study sampled 15 000 test takers, 1000 each from 15 different language backgrounds, from the total of 24 500 participants in the TOEFL generic data set which itself was a sample from the May 1991 worldwide administration of the TOEFL. The test was administered under normal operational conditions and included all three subtests: (1) Listening Comprehension, (2) Structure and Written Expression, and (3) Vocabulary and Reading Comprehension. The analyses included descriptive statistics, classical theory reliability estimates, and a series of generalizability studies conducted to isolate the variance components due to persons, items, subtests and languages, and their effects on the dependability of the test. Unlike previous research, the results here indicate that, when considered in concert with other important sources of variance (persons, items and subtests), language differences alone account for only a very small proportion of TOEFL test variance. These results should prove useful to test developers and researchers interested in the relative effects of such factors on test design.Keywords
This publication has 17 references indexed in Scilit:
- Investigating variability in tasks and rater judgements in a performance test of foreign language speakingLanguage Testing, 1995
- An investigation of a criterion- referenced test using G-theory, and factor and cluster analysesLanguage Testing, 1992
- Short-cut estimators of criterion-referenced test consistencyLanguage Testing, 1990
- DIF in Native Language and Gender Groups in an ESL Placement TestTESOL Quarterly, 1990
- Linguistic and cultural bias in language proficiency testsLanguage Testing, 1985
- A CATEGORICAL INSTRUMENT FOR SCORING SECOND LANGUAGE WRITING SKILLSLanguage Learning, 1984
- AN INTRODUCTION TO GENERALIZABILITY THEORY IN SECOND LANGUAGE RESEARCH1Language Learning, 1982
- Measures of Language Proficiency from the Learner's PerspectiveTESOL Quarterly, 1982
- The Evaluation of Differences in Test Performance of Two or More GroupsEducational and Psychological Measurement, 1974
- THEORY OF GENERALIZABILITY: A LIBERALIZATION OF RELIABILITY THEORY†British Journal of Statistical Psychology, 1963