Trace Lines for Testlets: A Use of Multiple‐Categorical‐Response Models

It is not always convenient or appropriate to construct tests in which individual items are fungible. There are situations in which small clusters of items (testlets) are the units that are assembled to create a test. Using data from a test of reading comprehension constructed of four passages with several questions following each passage, we show that local independence fails at the level of the individual questions. The questions following each passage, however, constitute a testlet. We discuss the application to testlet scoring of some multiple‐category models originally developed for individual items, In the example examined, the concurrent validity of the testlet scoring equaled or exceeded that of individual‐item‐level scoring