Abstract
Statisticians seek tests which have maximum power amongst tests of size α. In both numerical and theoretical studies, the standard approach is to compare the powers of competing tests which have the same nominal size α*. In most cases, α and α* differ; and in this case, the differing size biases of the tests contaminate any comparisons of their power. For instance, two nominal 5% tests with actual sizes 4% and 6% should not have their powers naively compared. In this paper, the basic problem of trading-off size for power is approached through the existing theory of receiver operating characteristic curves. This leads us to a simple way of estimating power adjusted for size, not only for a fixed nominal size, but also for a range of relevant nominal sizes. The calculations required are both familiar and simple. We recommend that the methods be routinely applied to simulations studies that compare alternative tests of the same hypotheses.