Interpreting the results of observational research: chance is not such a fine thing

17 September 1994

journal article
review article
Published by BMJ in BMJ

Vol. 309 (6956) , 727-730
https://doi.org/10.1136/bmj.309.6956.727

Abstract

In a randomised controlled trial, if the design is not flawed, different outcomes in the study groups must be due to the intervention itself or to chance imbalances between the groups. Because of this tests of statistical significance are used to assess the validity of results from randomised studies. Most published papers in medical research, however, describe observational studies which do not include randomised intervention. This paper argues that the continuing application of tests of significance to such non-randomised investigations is inappropriate. It draws a distinction between bias and chance imbalance on the one hand (both randomised and observational studies can be affected) and confounding on the other (a unique problem for observational investigations). It concludes that neither the P value nor the 95% confidence interval should be used as evidence for the validity of an observational result. Epidemiologists and clinical researchers design studies to estimate the effect which a presumed cause or treatment has on the occurrence of a disease. Most questions about causes of disease cannot be addressed by experiments: we must rely on the observation of life as it is, rather than of the results of controlled intervention. Such observational studies cannot provide proof of causality but are still the basis for reasoned public health decisions. In the presentation of results from observational studies significance tests are often presented as judgments on the “truth” or validity of the effect which a presumed cause has on the occurrence of a disease. In 1965 Bradford Hill lamented this application of statistics,1 a concern given prominence again recently.2 Yet almost 30 years on, phrases such as “the result just failed to reach statistical significance” are still part of the argot of medical papers and presentations. The move towards estimating confidence intervals has not resolved this problem, as the …

Keywords

This publication has 13 references indexed in Scilit:

Bias in analytic research
Published by Elsevier ,2004
The glitter of the t table
The Lancet, 1993
Smoking as "independent" risk factor for suicide: illustration of an artifact from observational epidemiology?
The Lancet, 1992
Prevention of neural tube defects: Results of the Medical Research Council Vitamin Study
Published by Elsevier ,1991
Randomization, Statistics, and Causal Inference
Epidemiology, 1990
Beyond the confidence interval.
American Journal of Public Health, 1987
Confidence intervals rather than P values: estimation rather than hypothesis testing.
BMJ, 1986
FURTHER EXPERIENCE OF VITAMIN SUPPLEMENTATION FOR PREVENTION OF NEURAL TUBE DEFECT RECURRENCES
The Lancet, 1983
CONTROLLED, RANDOMISED TRIAL OF THE EFFECT OF DIETARY FAT ON BLOOD PRESSURE
The Lancet, 1983
A Show of Confidence
New England Journal of Medicine, 1978