Endodontic leakage studies reconsidered. Part II. Statistical aspects

Abstract
The aim of many endodontic studies is to compare two or more treatment methods, techniques or materials, for example, to detect differences in mean leakage scores. As it is not feasible to study large populations, samples are taken. The important question then arises as to how large the sample sizes have to be in order to establish the 'true' (= populations') mean scores. First, it must be determined which magnitude of the difference (= v) between the mean scores is of endodontic interest. Based upon v and a few related statistical parameters, one may calculate how large the samples must be in order that a statistical test yields a significant result for a difference that is of endodontic importance. In other words, the 'power' of a test, depending on the sample size among other factors, must be large enough to detect the 'true' a priori determined difference between the populations. The use of small sample sizes may imply that a rather large difference between two mean leakage scores is not found to be significant, thereby leading to incorrect conclusions. This article describes the power and the statistical related factors that determine the adequate size of samples. Examples of power calculation are presented. Next, the power of publicized endodontic leakage studies was evaluated. Almost two-thirds of the sample sizes were 10 or less, and about 90% were 20 or less. Less than one-half of the tests had an adequate power (conventionally > or = 0.80). It is necessary to be cautious when extrapolating the results of such studies, because of the limited power of the statistical tests. The power may be increased by using larger sample sizes or, alternatively, by enlarging the 'effect size', by either taking an interest in a larger difference between the mean scores, or by minimizing the variability of the data.