What's wrong with Bonferroni adjustments
- 18 April 1998
- Vol. 316 (7139) , 1236-1238
- https://doi.org/10.1136/bmj.316.7139.1236
Abstract
When more than one statistical test is performed in analysing the data from a clinical study, some statisticians and journal editors demand that a more stringent criterion be used for “statistical significance” than the conventional P<0.05.1 Many well meaning researchers, eager for methodological rigour, comply without fully grasping what is at stake. Recently, adjustments for multiple tests (or Bonferroni adjustments) have found their way into introductory texts on medical statistics, which has increased their apparent legitimacy. This paper advances the view, widely held by epidemiologists, that Bonferroni adjustments are, at best, unnecessary and, at worst, deleterious to sound statistical inference. #### Summary points Adjusting statistical significance for the number of tests that have been performed on study data—the Bonferroni method—creates more problems than it solves The Bonferroni method is concerned with the general null hypothesis (that all null hypotheses are true simultaneously), which is rarely of interest or use to researchers The main weakness is that the interpretation of a finding depends on the number of other tests performed The likelihood of type II errors is also increased, so that truly important differences are deemed non-significant Simply describing what tests of significance have been performed, and why, is generally the best way of dealing with multiple comparisons Bonferroni adjustments are based on the following reasoning.1-3 If a null hypothesis is true (for instance, two treatment groups in a randomised trial do not differ in terms of cure rates), a significant difference (P<0.05) will be observed by chance once in 20 trials. This is the type I error, or α. When 20 independent tests are performed (for example, study groups are compared with regard to 20 unrelated variables) and the null hypothesis holds for all 20 comparisons, the chance of at least one test being significant is no longer 0.05, but 0.64. …Keywords
This publication has 11 references indexed in Scilit:
- Faculty Opinions recommendation of No adjustments are needed for multiple comparisons.Published by H1 Connect ,2013
- How to read a paper: Statistics for the non-statistician. I: Different types of data need different statistical testsBMJ, 1997
- Multiple Comparisons and Related Issues in the Interpretation of Epidemiologic DataAmerican Journal of Epidemiology, 1995
- Statistics notes: Multiple significance tests: the Bonferroni methodBMJ, 1995
- No Adjustments Are Needed for Multiple ComparisonsEpidemiology, 1990
- Evidence and scientific research.American Journal of Public Health, 1988
- Simultaneous Inference in Epidemiological StudiesInternational Journal of Epidemiology, 1982
- Some Thoughts on Clinical Trials, Especially Problems of MultiplicityScience, 1977
- Tests of Significance Considered as EvidenceJournal of the American Statistical Association, 1942
- On the Use and Interpretation of Certain Test Criteria for Purposes of Statistical Inference: Part IBiometrika, 1928