Independent filtering increases detection power for high-throughput experiments

Top Cited Papers

11 May 2010

journal article
research article
Published by Proceedings of the National Academy of Sciences in Proceedings of the National Academy of Sciences

Vol. 107 (21) , 9546-9551
https://doi.org/10.1073/pnas.0914005107

Abstract

With high-dimensional data, variable-by-variable statistical testing is often used to select variables whose behavior differs across conditions. Such an approach requires adjustment for multiple testing, which can result in low statistical power. A two-stage approach that first filters variables by a criterion independent of the test statistic, and then only tests variables which pass the filter, can provide higher power. We show that use of some filter/test statistics pairs presented in the literature may, however, lead to loss of type I error control. We describe other pairs which avoid this problem. In an application to microarray data, we found that gene-by-gene filtering by overall variance followed by a t -test increased the number of discoveries by 50%. We also show that this particular statistic pair induces a lower bound on fold-change among the set of discoveries. Independent filtering—using filter/test pairs that are independent under the null hypothesis but correlated under the alternative—is a general approach that can substantially increase the efficiency of experiments.

Keywords

This publication has 23 references indexed in Scilit:

Genome-Wide Significance Levels and Weighted Hypothesis Testing
Statistical Science, 2009
Filtering Genes for Cluster and Network Analysis
BMC Bioinformatics, 2009
Filtering for increased power for microarray data analysis
BMC Bioinformatics, 2009
Sure Independence Screening for Ultrahigh Dimensional Feature Space
Journal of the Royal Statistical Society Series B: Statistical Methodology, 2008
FDR- and FWE-controlling methods using data-driven weights
Journal of Statistical Planning and Inference, 2007
A Method to Increase the Power of Multiple Testing Procedures Through Sample Splitting
Statistical Applications in Genetics and Molecular Biology, 2006
Linear Models and Empirical Bayes Methods for Assessing Differential Expression in Microarray Experiments
Statistical Applications in Genetics and Molecular Biology, 2004
The control of the false discovery rate in multiple testing under dependency
The Annals of Statistics, 2001
Significance analysis of microarrays applied to the ionizing radiation response
Proceedings of the National Academy of Sciences, 2001
Analysis of Variance for Gene Expression Microarray Data
Journal of Computational Biology, 2000