Current Concepts Review - Sample Size and Statistical Power in Clinical Orthopaedic Research*
- 1 October 1999
- journal article
- review article
- Published by Wolters Kluwer Health in Journal of Bone and Joint Surgery
- Vol. 81 (10) , 1454-60
- https://doi.org/10.2106/00004623-199910000-00011
Abstract
Classic principles of treatment in orthopaedic surgery - the immobilization of fractures or the draining of infected wounds, for example - were not first established in prospective clinical trials or laboratory experiments. Rather, they were derived from perceptive observation: the methods were seen to work in practice, and they were retained. Observation has a noble history in medicine and science. Still, the modern reader is at least intuitively aware of the limitations of mere observation. Imagine if an investigator were to claim that prophylaxis against deep-vein thrombosis after hip replacement is not needed simply because only two thromboses were observed in ten patients who did not receive prophylaxis compared with three thromboses in ten patients who did. Such a study, were it to be published, would be the object of ridicule. A single clinical observation is actually a sample of the entire set of possible observations, and methods of statistical inference are needed to draw valid conclusions. Almost all clinical research for the assessment of treatments and outcomes relies on statistical sampling - that is, a set of rules that help to ensure that the individuals included in a study are representative of the larger population to which the investigator wishes to generalize the findings. After the data have been gathered, statistical inference is applied. Statistical inference is a mechanism for evaluating an observed finding (for example, a difference between two treatment groups) relative to differences that may have occurred by chance alone, given the observed variability in measurements. This allows statements about an entire population to be made without the necessity of studying every member of that population - which is rarely feasible even if desired. To determine if observed differences represent true differences as opposed to differences that could be expected to occur because of random chance alone, the investigator uses statistical tests. These tests determine the probability of a difference as extreme as that observed or more so being observed under the null hypothesis of no true underlying difference. When this probability is small, it may be concluded that there is a real difference between the two populations that the samples represent. This is the meaning of the term significant. It is, of course, possible that the testing of samples from two truly distinct groups will not always disprove the null hypothesis and thus will fail to show that the groups are significantly different. Stated another way, failure to prove that two groups are different is not equivalent to proving that they are the same. Accordingly, if no significant differences are found, the reader may ask whether the investigator failed to demonstrate significant differences because the samples were not unique or because they were unique but were not proved to be so. Many readers and researchers are aware of the need to consider the possibility that two samples that seem to differ actually come from a single distribution and therefore do not really differ. Thus, they are familiar with p values and their associated alpha threshold (typically, p Related Topics Loading Related ArticlesKeywords
This publication has 8 references indexed in Scilit:
- Type II (β) errors in the hand literature: The importance of powerThe Journal of Hand Surgery, 1998
- Empowering research: statistical power in general practice researchFamily Practice, 1997
- Power and sample size calculationsControlled Clinical Trials, 1990
- Statistical Conclusion Validity of Rehabilitation ResearchAmerican Journal of Physical Medicine & Rehabilitation, 1990
- Randomisation and baseline comparisons in clinical trialsThe Lancet, 1990
- The beta error and sample size determination in clinical trials in emergency medicineAnnals of Emergency Medicine, 1987
- Reporting on Methods in Clinical TrialsNew England Journal of Medicine, 1982
- The Importance of Beta, the Type II Error and Sample Size in the Design and Interpretation of the Randomized Control TrialNew England Journal of Medicine, 1978