Abstract
This study was concerned with the reliability and validity of subjective judgments about five characteristics of multiple-choice test items from an introductory astronomy test: (a) item difficulty, (b) language complexity, (c) content importance or relevance, (d) response set convergence, and (e) process complexity. Judgments of item difficulty were the least reliable. Judgments on all characteristics were significantly associated with empirical values of objective item difficulty, which served as the criterion measure. Complexity exhibited the closest relationship with both empirical (objective) and judged (subjective) item difficulty.

This publication has 9 references indexed in Scilit: