Estimation of False Discovery Rates in Multiple Testing: Application to Gene Microarray Data
- 11 December 2003
- journal article
- Published by Oxford University Press (OUP) in Biometrics
- Vol. 59 (4) , 1071-1081
- https://doi.org/10.1111/j.0006-341x.2003.00123.x
Abstract
Summary. Testing for significance with gene expression data from DNA microarray experiments involves simultaneous comparisons of hundreds or thousands of genes. If R denotes the number of rejections (declared significant genes) and V denotes the number of false rejections, then V/R, if R > 0, is the proportion of false rejected hypotheses. This paper proposes a model for the distribution of the number of rejections and the conditional distribution of V given R, V | R. Under the independence assumption, the distribution of R is a convolution of two binomials and the distribution of V | R has a noncentral hypergeometric distribution. Under an equicorrelated model, the distributions are more complex and are also derived. Five false discovery rate probability error measures are considered: FDR = E(V/R), pFDR = E(V/R | R > 0) (positive FDR), cFDR = E(V/R | R = r) (conditional FDR), mFDR = E(V)/E(R) (marginal FDR), and eFDR = E(V)/r (empirical FDR). The pFDR, cFDR, and mFDR are shown to be equivalent under the Bayesian framework, in which the number of true null hypotheses is modeled as a random variable. We present a parametric and a bootstrap procedure to estimate the FDRs. Monte Carlo simulations were conducted to evaluate the performance of these two methods. The bootstrap procedure appears to perform reasonably well, even when the alternative hypotheses are correlated (ρ = .25). An example from a toxicogenomic microarray experiment is presented for illustration.Keywords
This publication has 17 references indexed in Scilit:
- Dinosaur sanctuary on the Chatham Islands, Southwest Pacific: First record of theropods from the K–T boundary Takatika GritPalaeogeography, Palaeoclimatology, Palaeoecology, 2006
- Comparison of Methods for Estimating the Number of True Null Hypotheses in Multiplicity TestingJournal of Biopharmaceutical Statistics, 2003
- A Direct Approach to False Discovery RatesJournal of the Royal Statistical Society Series B: Statistical Methodology, 2002
- Significance analysis of microarrays applied to the ionizing radiation responseProceedings of the National Academy of Sciences, 2001
- On the Adaptive Control of the False Discovery Rate in Multiple Testing With Independent StatisticsJournal of Educational and Behavioral Statistics, 2000
- A step-down multiple hypotheses testing procedure that controls the false discovery rate under independenceJournal of Statistical Planning and Inference, 1999
- Generation of Over-Dispersed and Under-Dispersed Binomial VariatesJournal of Computational and Graphical Statistics, 1995
- Multiple Comparison Procedures: The Practical SolutionThe American Statistician, 1990
- A sharper Bonferroni procedure for multiple tests of significanceBiometrika, 1988
- Plots of P-values to evaluate many tests simultaneouslyBiometrika, 1982