Filtering for increased power for microarray data analysis
Open Access
- 8 January 2009
- journal article
- research article
- Published by Springer Nature in BMC Bioinformatics
- Vol. 10 (1) , 1-12
- https://doi.org/10.1186/1471-2105-10-11
Abstract
Due to the large number of hypothesis tests performed during the process of routine analysis of microarray data, a multiple testing adjustment is certainly warranted. However, when the number of tests is very large and the proportion of differentially expressed genes is relatively low, the use of a multiple testing adjustment can result in very low power to detect those genes which are truly differentially expressed. Filtering allows for a reduction in the number of tests and a corresponding increase in power. Common filtering methods include filtering by variance, average signal or MAS detection call (for Affymetrix arrays). We study the effects of filtering in combination with the Benjamini-Hochberg method for false discovery rate control and q-value for false discovery rate estimation. Three case studies are used to compare three different filtering methods in combination with the two false discovery rate methods and three different preprocessing methods. For the case studies considered, filtering by detection call and variance (on the original scale) consistently led to an increase in the number of differentially expressed genes identified. On the other hand, filtering by variance on the log2 scale had a detrimental effect when paired with MAS5 or PLIER preprocessing methods, even when the testing was done on the log2 scale. A simulation study was done to further examine the effect of filtering by variance. We find that filtering by variance leads to higher power, often with a decrease in false discovery rate, when paired with either of the false discovery rate methods considered. This holds regardless of the proportion of genes which are differentially expressed or whether we assume dependence or independence among genes. The case studies show that both detection call and variance filtering are viable methods of filtering which can increase the number of differentially expressed genes identified. The simulation study demonstrates that when paired with a false discovery rate method, filtering by variance can increase power while still controlling the false discovery rate. Filtering out 50% of probe sets seems reasonable as long as the majority of genes are not expected to be differentially expressed.Keywords
This publication has 9 references indexed in Scilit:
- Impaired Lung Homeostasis in Neonatal Mice Exposed to Cigarette SmokeAmerican Journal of Respiratory Cell and Molecular Biology, 2008
- Transcriptomic analysis of the cardiac left ventricle in a rodent model of diabetic cardiomyopathy: molecular snapshot of a severe myocardial diseasePhysiological Genomics, 2007
- Rat toxicogenomic study reveals analytical consistency across microarray platformsNature Biotechnology, 2006
- Effects of filtering by Present call on analysis of microarray experimentsBMC Bioinformatics, 2006
- A Two-Step Multiple Comparison Procedure for a Large Number of Tests and Multiple TreatmentsStatistical Applications in Genetics and Molecular Biology, 2006
- A comparative review of estimates of the proportion unchanged genes and the false discovery rateBMC Bioinformatics, 2005
- Statistical significance for genomewide studiesProceedings of the National Academy of Sciences, 2003
- Exploration, normalization, and summaries of high density oligonucleotide array probe level dataBiostatistics, 2003
- Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple TestingJournal of the Royal Statistical Society Series B: Statistical Methodology, 1995