Combiningp‐values in large‐scale genomics experiments

Abstract
In large‐scale genomics experiments involving thousands of statistical tests, such as association scans and microarray expression experiments, a key question is: Which of theLtests represent true associations (TAs)? The traditional way to control false findings is via individual adjustments. In the presence of multiple TAs,p‐value combination methods offer certain advantages. Both Fisher's and Lancaster's combination methods use an inverse gamma transformation. We identify the relation of the shape parameter of that distribution to the implicit threshold value;p‐values below that threshold are favored by the inverse gamma method (GM). We explore this feature to improve power over Fisher's method whenLis large and the number of TAs is moderate. However, the improvement in power provided by combination methods is at the expense of a weaker claim made upon rejection of the null hypothesis – that there are some TAs among theLtests. Thus, GM remains a global test. To allow a stronger claim about a subset ofp‐values that is smaller thanL, we investigate two methods with an explicit truncation: the rank truncated product method (RTP) that combines the firstK‐orderedp‐values, and the truncated product method (TPM) that combinesp‐values that are smaller than a specified threshold. We conclude that TPM allows claims to be made about subsets ofp‐values, while the claim of the RTP is, like GM, more appropriately about allLtests. GM gives somewhat higher power than TPM, RTP, Fisher, and Simes methods across a range of simulations. Copyright © 2007 John Wiley & Sons, Ltd.
Funding Information
  • National Institutes of Health, National Institute of Environmental Health Sciences