Abstract
Responses to a 40-item, four-choice test were simulated for 120 examinees under six response-scoring modes including number-right, corrected-for-guessing and answer-until-correct. Separate score sets were generated to reflect five levels of prevalence of misinformation (belief that an answer is a distractor) and five levels of propensity-to-guess contrary to instructions for modes designed to inhibit guessing. Criteria were simulated using the number-right mode with five levels of misinformation prevalence and four levels of true-score relationship with the predictor. The entire process was repeated with the introduction of normally distributed, random error at the item level. This process yielded 260 sets of five scores (predictor and four criteria), which were examined to determine differential effects on reliability and validity attributable to the response-scoring modes. Modes permitting multiple responses to an item were found to yield genuine increases in internal consistency reliability, which tended to carry over into validity coefficients. However, the validity differences among all the response-scoring modes simulated were small, probably too small to justify the additional cost and complexity of modes other than number-right.

This publication has 12 references indexed in Scilit: