A note on using permutation-based false discovery rate estimates to compare different analysis methods for microarray data

Open Access

27 September 2005

journal article
research article
Published by Oxford University Press (OUP) in Bioinformatics

Vol. 21 (23) , 4280-4288
https://doi.org/10.1093/bioinformatics/bti685

Abstract

Motivation: False discovery rate (FDR) is defined as the expected percentage of false positives among all the claimed positives. In practice, with the true FDR unknown, an estimated FDR can serve as a criterion to evaluate the performance of various statistical methods under the condition that the estimated FDR approximates the true FDR well, or at least, it does not improperly favor or disfavor any particular method. Permutation methods have become popular to estimate FDR in genomic studies. The purpose of this paper is 2-fold. First, we investigate theoretically and empirically whether the standard permutation-based FDR estimator is biased, and if so, whether the bias inappropriately favors or disfavors any method. Second, we propose a simple modification of the standard permutation to yield a better FDR estimator, which can in turn serve as a more fair criterion to evaluate various statistical methods. Results: Both simulated and real data examples are used for illustration and comparison. Three commonly used test statistics, the sample mean, SAM statistic and Student's t-statistic, are considered. The results show that the standard permutation method overestimates FDR. The overestimation is the most severe for the sample mean statistic while the least for the t-statistic with the SAM-statistic lying between the two extremes, suggesting that one has to be cautious when using the standard permutation-based FDR estimates to evaluate various statistical methods. In addition, our proposed FDR estimation method is simple and outperforms the standard method. Contact:yangxie@biostat.umn.ed

Keywords

This publication has 44 references indexed in Scilit:

Improved statistical tests for differential gene expression by shrinking variance components estimates
Biostatistics, 2004
A case study on choosing normalization methods and test statistics for two‐channel microarray data
Comparative and Functional Genomics, 2004
Statistical significance for genomewide studies
Proceedings of the National Academy of Sciences, 2003
A mixture model approach to detecting differentially expressed genes with microarray data
Functional & Integrative Genomics, 2003
A Direct Approach to False Discovery Rates
Journal of the Royal Statistical Society Series B: Statistical Methodology, 2002
Significance analysis of microarrays applied to the ionizing radiation response
Proceedings of the National Academy of Sciences, 2001
Genomic binding sites of the yeast cell-cycle transcription factors SBF and MBF
Nature, 2001
Genome-Wide Location and Function of DNA Binding Proteins
Science, 2000
Exploring the new world of the genome with DNA microarrays
Nature Genetics, 1999
Efficient Calculation of the Permutation Distribution of Trimmed Means
Journal of the American Statistical Association, 1991