Incorporating the number of true null hypotheses to improve power in multiple testing: application to gene microarray data

30 August 2007

journal article
research article
Published by Taylor & Francis in Journal of Statistical Computation and Simulation

Vol. 77 (9) , 757-767
https://doi.org/10.1080/10629360600648651

Abstract

Testing for significance with gene expression data from DNA microarray experiments involves simultaneous comparisons of hundreds or thousands of genes. In common exploratory microarray experiments, most genes are not expected to be differentially expressed. The family-wise error (FWE) rate and false discovery rate (FDR) are two common approaches used to account for multiple hypothesis tests to identify differentially expressed genes. When the number of hypotheses is very large and some null hypotheses are expected to be true, the power of an FWE or FDR procedure can be improved if the number of null hypotheses is known. The mean of differences (MD) of ranked p-values has been proposed to estimate the number of true null hypotheses under the independence model. This article proposes to incorporate the MD estimate into an FWE or FDR approach for gene identification. Simulation results show that the procedure appears to control the FWE and FDR well at the FWE=0.05 and FDR=0.05 significant levels; it exceeds the nominal level for FDR=0.01 when the null hypotheses are highly correlated, a correlation of 0.941. The proposed approach is applied to a public colon tumor data set for illustration.

Keywords

This publication has 10 references indexed in Scilit:

Multiple Hypothesis Testing in Microarray Experiments
Statistical Science, 2003
Comparison of Methods for Estimating the Number of True Null Hypotheses in Multiplicity Testing
Journal of Biopharmaceutical Statistics, 2003
A mixture model approach for the analysis of microarray gene expression data
Computational Statistics & Data Analysis, 2002
The control of the false discovery rate in multiple testing under dependency
The Annals of Statistics, 2001
Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays
Proceedings of the National Academy of Sciences, 1999
Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing
Journal of the Royal Statistical Society Series B: Statistical Methodology, 1995
Approximate multinormal probabilities applied to correlated multiple endpoints in clinical trials
Statistics in Medicine, 1991
Multiple Comparison Procedures
Wiley Series in Probability and Statistics, 1987
Plots of P-values to evaluate many tests simultaneously
Biometrika, 1982
Rectangular Confidence Regions for the Means of Multivariate Normal Distributions
Journal of the American Statistical Association, 1967