A statistical framework for testing functional categories in microarray data
Open Access
- 1 March 2008
- journal article
- Published by Institute of Mathematical Statistics in The Annals of Applied Statistics
- Vol. 2 (1) , 286-315
- https://doi.org/10.1214/07-aoas146
Abstract
Ready access to emerging databases of gene annotation and functional pathways has shifted assessments of differential expression in DNA microarray studies from single genes to groups of genes with shared biological function. This paper takes a critical look at existing methods for assessing the differential expression of a group of genes (functional category), and provides some suggestions for improved performance. We begin by presenting a general framework, in which the set of genes in a functional category is compared to the complementary set of genes on the array. The framework includes tests for overrepresentation of a category within a list of significant genes, and methods that consider continuous measures of differential expression. Existing tests are divided into two classes. Class 1 tests assume gene-specific measures of differential expression are independent, despite overwhelming evidence of positive correlation. Analytic and simulated results are presented that demonstrate Class 1 tests are strongly anti-conservative in practice. Class 2 tests account for gene correlation, typically through array permutation that by construction has proper Type I error control for the induced null. However, both Class 1 and Class 2 tests use a null hypothesis that all genes have the same degree of differential expression. We introduce a more sensible and general (Class 3) null under which the profile of differential expression is the same within the category and complement. Under this broader null, Class 2 tests are shown to be conservative. We propose standard bootstrap methods for testing against the Class 3 null and demonstrate they provide valid Type I error control and more power than array permutation in simulated datasets and real microarray experiments.Keywords
All Related Versions
This publication has 29 references indexed in Scilit:
- Multiple tests of association with biological annotation metadataPublished by Institute of Mathematical Statistics ,2008
- Analyzing gene expression data in terms of gene sets: methodological issuesBioinformatics, 2007
- Genes Involved in DNA Repair and Nitrosamine Metabolism and Those Located on Chromosome 14q32 Are Dysregulated in Nasopharyngeal CarcinomaCancer Epidemiology, Biomarkers & Prevention, 2006
- Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profilesProceedings of the National Academy of Sciences, 2005
- Identifying subtle interrelated changes in functional gene categories using continuous measures of gene expressionBioinformatics, 2005
- Statistical concerns about the GSEA procedureNature Genetics, 2004
- PGC-1α-responsive genes involved in oxidative phosphorylation are coordinately downregulated in human diabetesNature Genetics, 2003
- Significance analysis of microarrays applied to the ionizing radiation responseProceedings of the National Academy of Sciences, 2001
- Better Bootstrap Confidence IntervalsJournal of the American Statistical Association, 1987
- On the Probability that Two Independent Distributions of Frequency are Really Samples from the Same PopulationBiometrika, 1911