Why Most Discovered True Associations Are Inflated
Top Cited Papers
- 1 September 2008
- journal article
- review article
- Published by Wolters Kluwer Health in Epidemiology
- Vol. 19 (5) , 640-648
- https://doi.org/10.1097/ede.0b013e31818131e7
Abstract
Newly discovered true (non-null) associations often have inflated effects compared with the true effect sizes. I discuss here the main reasons for this inflation. First, theoretical considerations prove that when true discovery is claimed based on crossing a threshold of statistical significance and the discovery study is underpowered, the observed effects are expected to be inflated. This has been demonstrated in various fields ranging from early stopped clinical trials to genome-wide associations. Second, flexible analyses coupled with selective reporting may inflate the published discovered effects. The vibration ratio (the ratio of the largest vs. smallest effect on the same association approached with different analytic choices) can be very large. Third, effects may be inflated at the stage of interpretation due to diverse conflicts of interest. Discovered effects are not always inflated, and under some circumstances may be deflated-for example, in the setting of late discovery of associations in sequentially accumulated overpowered evidence, in some types of misclassification from measurement error, and in conflicts causing reverse biases. Finally, I discuss potential approaches to this problem. These include being cautious about newly discovered effect sizes, considering some rational down-adjustment, using analytical methods that correct for the anticipated inflation, ignoring the magnitude of the effect (if not necessary), conducting large studies in the discovery phase, using strict protocols for analyses, pursuing complete and transparent reporting of all results, placing emphasis on replication, and being fair with interpretation of results.Keywords
This publication has 97 references indexed in Scilit:
- Observational Research, Randomised Trials, and Two Views of Medical SciencePLoS Medicine, 2008
- DNA data sharing: research participants' perspectivesGenetics in Medicine, 2008
- Clustered Environments and Randomized Genes: A Fundamental Distinction between Conventional and Genetic EpidemiologyPLoS Medicine, 2007
- Uncertainty in heterogeneity estimates in meta-analysesBMJ, 2007
- The NCBI dbGaP database of genotypes and phenotypesNature Genetics, 2007
- Overcoming the Winner’s Curse: Estimating Penetrance Parameters from Case-Control DataAmerican Journal of Human Genetics, 2007
- Selection in Reported Epidemiological Risks: An Empirical AssessmentPLoS Medicine, 2007
- Contradicted and Initially Stronger Effects in Highly Cited Clinical ResearchJAMA, 2005
- Empirical Evidence for Selective Reporting of Outcomes in Randomized TrialsJAMA, 2004
- Gene expression profiling predicts clinical outcome of breast cancerNature, 2002