Incorporating the number of true null hypotheses to improve power in multiple testing: application to gene microarray data
- 30 August 2007
- journal article
- research article
- Published by Taylor & Francis in Journal of Statistical Computation and Simulation
- Vol. 77 (9) , 757-767
- https://doi.org/10.1080/10629360600648651
Abstract
Testing for significance with gene expression data from DNA microarray experiments involves simultaneous comparisons of hundreds or thousands of genes. In common exploratory microarray experiments, most genes are not expected to be differentially expressed. The family-wise error (FWE) rate and false discovery rate (FDR) are two common approaches used to account for multiple hypothesis tests to identify differentially expressed genes. When the number of hypotheses is very large and some null hypotheses are expected to be true, the power of an FWE or FDR procedure can be improved if the number of null hypotheses is known. The mean of differences (MD) of ranked p-values has been proposed to estimate the number of true null hypotheses under the independence model. This article proposes to incorporate the MD estimate into an FWE or FDR approach for gene identification. Simulation results show that the procedure appears to control the FWE and FDR well at the FWE=0.05 and FDR=0.05 significant levels; it exceeds the nominal level for FDR=0.01 when the null hypotheses are highly correlated, a correlation of 0.941. The proposed approach is applied to a public colon tumor data set for illustration.Keywords
This publication has 10 references indexed in Scilit:
- Multiple Hypothesis Testing in Microarray ExperimentsStatistical Science, 2003
- Comparison of Methods for Estimating the Number of True Null Hypotheses in Multiplicity TestingJournal of Biopharmaceutical Statistics, 2003
- A mixture model approach for the analysis of microarray gene expression dataComputational Statistics & Data Analysis, 2002
- The control of the false discovery rate in multiple testing under dependencyThe Annals of Statistics, 2001
- Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arraysProceedings of the National Academy of Sciences, 1999
- Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple TestingJournal of the Royal Statistical Society Series B: Statistical Methodology, 1995
- Approximate multinormal probabilities applied to correlated multiple endpoints in clinical trialsStatistics in Medicine, 1991
- Multiple Comparison ProceduresWiley Series in Probability and Statistics, 1987
- Plots of P-values to evaluate many tests simultaneouslyBiometrika, 1982
- Rectangular Confidence Regions for the Means of Multivariate Normal DistributionsJournal of the American Statistical Association, 1967