Abstract
In many situations a variety of tests are available to test essentially the same null hypothesis. In practice the statistician who fails to reject with the first test used will sometimes try several others, stopping when he obtains the hoped-for significance. This raises the type I error rate, but no broad study has previously been made to address the question by how much. Here the effect of such multiple testing is investigated by simulating two-sample data and studying five common tests: the t, Wilcoxon-Mann-Whitney,t on logs, Yuen-Dixon trimmed t, and Welch's test.

This publication has 13 references indexed in Scilit: