Independent filtering increases detection power for high-throughput experiments
Top Cited Papers
- 11 May 2010
- journal article
- research article
- Published by Proceedings of the National Academy of Sciences in Proceedings of the National Academy of Sciences
- Vol. 107 (21) , 9546-9551
- https://doi.org/10.1073/pnas.0914005107
Abstract
With high-dimensional data, variable-by-variable statistical testing is often used to select variables whose behavior differs across conditions. Such an approach requires adjustment for multiple testing, which can result in low statistical power. A two-stage approach that first filters variables by a criterion independent of the test statistic, and then only tests variables which pass the filter, can provide higher power. We show that use of some filter/test statistics pairs presented in the literature may, however, lead to loss of type I error control. We describe other pairs which avoid this problem. In an application to microarray data, we found that gene-by-gene filtering by overall variance followed by a t -test increased the number of discoveries by 50%. We also show that this particular statistic pair induces a lower bound on fold-change among the set of discoveries. Independent filtering—using filter/test pairs that are independent under the null hypothesis but correlated under the alternative—is a general approach that can substantially increase the efficiency of experiments.Keywords
This publication has 23 references indexed in Scilit:
- Genome-Wide Significance Levels and Weighted Hypothesis TestingStatistical Science, 2009
- Filtering Genes for Cluster and Network AnalysisBMC Bioinformatics, 2009
- Filtering for increased power for microarray data analysisBMC Bioinformatics, 2009
- Sure Independence Screening for Ultrahigh Dimensional Feature SpaceJournal of the Royal Statistical Society Series B: Statistical Methodology, 2008
- FDR- and FWE-controlling methods using data-driven weightsJournal of Statistical Planning and Inference, 2007
- A Method to Increase the Power of Multiple Testing Procedures Through Sample SplittingStatistical Applications in Genetics and Molecular Biology, 2006
- Linear Models and Empirical Bayes Methods for Assessing Differential Expression in Microarray ExperimentsStatistical Applications in Genetics and Molecular Biology, 2004
- The control of the false discovery rate in multiple testing under dependencyThe Annals of Statistics, 2001
- Significance analysis of microarrays applied to the ionizing radiation responseProceedings of the National Academy of Sciences, 2001
- Analysis of Variance for Gene Expression Microarray DataJournal of Computational Biology, 2000