Analyzing gene expression data in terms of gene sets: methodological issues
Top Cited Papers
- 15 February 2007
- journal article
- research article
- Published by Oxford University Press (OUP) in Bioinformatics
- Vol. 23 (8) , 980-987
- https://doi.org/10.1093/bioinformatics/btm051
Abstract
Motivation: Many statistical tests have been proposed in recent years for analyzing gene expression data in terms of gene sets, usually from Gene Ontology. These methods are based on widely different methodological assumptions. Some approaches test differential expression of each gene set against differential expression of the rest of the genes, whereas others test each gene set on its own. Also, some methods are based on a model in which the genes are the sampling units, whereas others treat the subjects as the sampling units. This article aims to clarify the assumptions behind different approaches and to indicate a preferential methodology of gene set testing.Results: We identify some crucial assumptions which are needed by the majority of methods. P-values derived from methods that use a model which takes the genes as the sampling unit are easily misinterpreted, as they are based on a statistical model that does not resemble the biological experiment actually performed. Furthermore, because these models are based on a crucial and unrealistic independence assumption between genes, the P-values derived from such methods can be wildly anti-conservative, as a simulation experiment shows. We also argue that methods that competitively test each gene set against the rest of the genes create an unnecessary rift between single gene testing and gene set testing.Contact: j.j.goeman@lumc.nlKeywords
This publication has 32 references indexed in Scilit:
- Microarray data analysis: from disarray to consolidation and consensusNature Reviews Genetics, 2006
- Discovering molecular functions significantly related to phenotypes by combining gene expression data and biological informationBioinformatics, 2005
- Significance analysis of functional categories in gene expression studies: a structured permutation approachBioinformatics, 2005
- Comparing functional annotation analyses with CatmapBMC Bioinformatics, 2004
- GO::TermFinder—open source software for accessing Gene Ontology information and finding significantly enriched Gene Ontology terms associated with a list of genesBioinformatics, 2004
- Statistical concerns about the GSEA procedureNature Genetics, 2004
- GOstat: find statistically overrepresented Gene Ontologies within a group of genesBioinformatics, 2004
- FatiGO: a web tool for finding significant associations of Gene Ontology terms with groups of genesBioinformatics, 2004
- Iterative Group Analysis (iGA): A simple tool to enhance sensitivity and facilitate interpretation of microarray experimentsBMC Bioinformatics, 2004
- Gene Ontology: tool for the unification of biologyNature Genetics, 2000