Bias in the estimation of false discovery rate in microarray studies
Open Access
- 16 August 2005
- journal article
- research article
- Published by Oxford University Press (OUP) in Bioinformatics
- Vol. 21 (20) , 3865-3872
- https://doi.org/10.1093/bioinformatics/bti626
Abstract
Motivation: The false discovery rate (FDR) provides a key statistical assessment for microarray studies. Its value depends on the proportion π0 of non-differentially expressed (non-DE) genes. In most microarray studies, many genes have small effects not easily separable from non-DE genes. As a result, current methods often overestimate π0 and FDR, leading to unnecessary loss of power in the overall analysis. Methods: For the common two-sample comparison we derive a natural mixture model of the test statistic and an explicit bias formula in the standard estimation of π0. We suggest an improved estimation of π0 based on the mixture model and describe a practical likelihood-based procedure for this purpose. Results: The analysis shows that a large bias occurs when π0 is far from 1 and when the non-centrality parameters of the distribution of the test statistic are near zero. The theoretical result also explains substantial discrepancies between non-parametric and model-based estimates of π0. Simulation studies indicate mixture-model estimates are less biased than standard estimates. The method is applied to breast cancer and lymphoma data examples. Availability: An R-package OCplus containing functions to compute π0 based on the mixture model, the resulting FDR and other operating characteristics of microarray data, is freely available at http://www.meb.ki.se/~yudpaw Contact:yudi.pawitan@meb.ki.se and alexander.ploner@meb.ki.seKeywords
This publication has 14 references indexed in Scilit:
- False discovery rate, sensitivity and sample size for microarray studiesBioinformatics, 2005
- A practical false discovery rate approach to identifying patterns of differential expression in microarray dataBioinformatics, 2005
- Empirical Bayes screening of many p-values with applications to microarray studiesBioinformatics, 2005
- A simple procedure for estimating the false discovery rateBioinformatics, 2004
- A mixture model-based strategy for selecting sets of genes in multiclass response microarray experimentsBioinformatics, 2004
- Large-Scale Simultaneous Hypothesis TestingJournal of the American Statistical Association, 2004
- Improving false discovery rate estimationBioinformatics, 2004
- The Use of Molecular Profiling to Predict Survival after Chemotherapy for Diffuse Large-B-Cell LymphomaNew England Journal of Medicine, 2002
- Empirical Bayes Analysis of a Microarray ExperimentJournal of the American Statistical Association, 2001
- Gene-Expression Profiles in Hereditary Breast CancerNew England Journal of Medicine, 2001