Variance of the Number of False Discoveries

Abstract

Summary: In high throughput genomic work, a very large number d of hypotheses are tested based on n≪d data samples. The large number of tests necessitates an adjustment for false discoveries in which a true null hypothesis was rejected. The expected number of false discoveries is easy to obtain. Dependences between the hypothesis tests greatly affect the variance of the number of false discoveries. Assuming that the tests are independent gives an inadequate variance formula. The paper presents a variance formula that takes account of the correlations between test statistics. That formula involves O(d2) correlations, and so a naïve implementation has cost O(nd2). A method based on sampling pairs of tests allows the variance to be approximated at a cost that is independent of d.

Keywords

Funding Information

US National Science Foundation (DMS-0306612)

This publication has 12 references indexed in Scilit:

Controlling the number of false discoveries: application to high-dimensional genomic data
Journal of Statistical Planning and Inference, 2004
A stochastic process approach to false discovery control
The Annals of Statistics, 2004
Diverse and Specific Gene Expression Responses to Stresses in Cultured Human Cells
Molecular Biology of the Cell, 2004
Strong Control, Conservative Point Estimation and Simultaneous Conservative Consistency of False Discovery Rates: A Unified Approach
Journal of the Royal Statistical Society Series B: Statistical Methodology, 2003
Multiple hypotheses testing and expected number of type I. errors
The Annals of Statistics, 2002
The control of the false discovery rate in multiple testing under dependency
The Annals of Statistics, 2001
Graphical Models
Published by Oxford University Press (OUP) ,1996
Quadpack
Published by Springer Nature ,1983
Plots of P-values to evaluate many tests simultaneously
Biometrika, 1982
Theoretical Statistics
Published by Springer Nature ,1974