Abstract
When determining how many items to include on a criterion-referenced test, practitioners must re solve various nonstatistical issues before a par ticular solution can be applied. A fundamental problem is deciding which of three true scores should be used. The first is based on the prob ability that an examinee is correct on a "typical" test item. The second is the probability of having acquired a typical skill among a domain of skills, and the third is based on latent trait models. Once a particular true score is settled upon, there are several perspectives that might be used to de termine test length. The paper reviews and critiques these solutions. Some new results are described that apply when latent structure models are used to esti mate an examinee's true score.

This publication has 56 references indexed in Scilit: