Heading Down the Wrong Pathway: on the Influence of Correlation within Gene Sets
Open Access
- 18 October 2010
- journal article
- research article
- Published by Springer Nature in BMC Genomics
- Vol. 11 (1) , 1-10
- https://doi.org/10.1186/1471-2164-11-574
Abstract
Background Analysis of microarray experiments often involves testing for the overrepresentation of pre-defined sets of genes among lists of genes deemed individually significant. Most popular gene set testing methods assume the independence of genes within each set, an assumption that is seriously violated, as extensive correlation between genes is a well-documented phenomenon. Results We conducted a meta-analysis of over 200 datasets from the Gene Expression Omnibus in order to demonstrate the practical impact of strong gene correlation patterns that are highly consistent across experiments. We show that a common independence assumption-based gene set testing procedure produces very high false positive rates when applied to data sets for which treatment groups have been randomized, and that gene sets with high internal correlation are more likely to be declared significant. A reanalysis of the same datasets using an array resampling approach properly controls false positive rates, leading to more parsimonious and high-confidence gene set findings, which should facilitate pathway-based interpretation of the microarray data. Conclusions These findings call into question many of the gene set testing results in the literature and argue strongly for the adoption of resampling based gene set testing criteria in the peer reviewed biomedical literature.Keywords
This publication has 40 references indexed in Scilit:
- A general modular framework for gene set enrichment analysisBMC Bioinformatics, 2009
- Bioinformatics enrichment tools: paths toward the comprehensive functional analysis of large gene listsNucleic Acids Research, 2008
- GEOmetadb: powerful alternative search engine for the Gene Expression OmnibusBioinformatics, 2008
- Significance levels for studies with correlated test statisticsBiostatistics, 2007
- [19] Gene Expression Omnibus: Microarray Data Storage, Submission, Retrieval, and AnalysisPublished by Elsevier ,2006
- A multivariate approach for integrating genome-wide expression data and biological knowledgeBioinformatics, 2006
- Met-regulated expression signature defines a subset of human hepatocellular carcinomas with poor prognosis and aggressive phenotypeJournal of Clinical Investigation, 2006
- Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profilesProceedings of the National Academy of Sciences, 2005
- PGC-1α-responsive genes involved in oxidative phosphorylation are coordinately downregulated in human diabetesNature Genetics, 2003
- Gene expression patterns of breast carcinomas distinguish tumor subclasses with clinical implicationsProceedings of the National Academy of Sciences, 2001