Research news and Comment: Reflections on Statistical and Substantive Significance, With a Slice of Replication

Abstract
In this comment, we propose some modifications to Thompson’s (1996) recent suggestions for AERA editorial policy on statistical significance testing. First, we discuss the potential problems, both procedural and conceptual, that could arise from professional journals mandating the addition of the modifier “statistically” to “significant.” If language remains an issue of consequence, however, findings stemming from rejected null hypotheses could be termed “nonchance” to communicate simply and more precisely what it is that a statistical test reveals. Second, although asking authors to provide explicit effect-size information is both sensible and useful, we illustrate how effect sizes (like p values) can be misinterpreted and misused. Finally, in accord with Thompson, we argue that greater attention to replication should be encouraged in educational research. At the same time, we believe that internal replication analyses do not represent adequate substitutes for (or even close approximations to) external replication studies based on new, independent participants. With journals welcoming both constructed and literal replication studies, consumers of the educational research literature can concentrate less on the limited believability of one-time findings and more on the enhanced believability of repeatable and generalizable ones.