Gene set enrichment analysis using linear models and diagnostics
- 11 September 2008
- journal article
- Published by Oxford University Press (OUP) in Bioinformatics
- Vol. 24 (22) , 2586-2591
- https://doi.org/10.1093/bioinformatics/btn465
Abstract
Motivation: Gene-set enrichment analysis (GSEA) can be greatly enhanced by linear model (regression) diagnostic techniques. Diagnostics can be used to identify outlying or influential samples, and also to evaluate model fit and explore model expansion.Results: We demonstrate this methodology on an adult acute lymphoblastic leukemia (ALL) dataset, using GSEA based on chromosome-band mapping of genes. Individual residuals, grouped or aggregated by chromosomal loci, indicate problematic samples and potential data-entry errors, and help identify hyperdiploidy as a factor playing a key role in expression for this dataset. Subsequent analysis pinpoints suspected DNA copy number abnormalities of specific samples and chromosomes (most prevalent are chromosomes X, 21 and 14), and also reveals significant expression differences between the hyperdiploid and diploid groups on other chromosomes (most prominently 19, 22, 3 and 13)—differences which are apparently not associated with copy number.Availability: Software for the statistical tools demonstrated in this article is available as Bioconductor package GSEAlm.Contact: assaf.oron@gmail.comSupplementary information: Supplementary data are available at Bioinformatics online.Keywords
This publication has 21 references indexed in Scilit:
- An improved method for detecting and delineating genomic regions with altered gene expression in cancerGenome Biology, 2008
- GlobalANCOVA: exploration and assessment of gene group effectsBioinformatics, 2007
- Extensions to gene set enrichmentBioinformatics, 2006
- Prediction of chromosomal aneuploidy from gene expression dataGenes, Chromosomes and Cancer, 2006
- A multivariate approach for integrating genome-wide expression data and biological knowledgeBioinformatics, 2006
- Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profilesProceedings of the National Academy of Sciences, 2005
- Multiple numerical chromosome aberrations in cancer: what are their causes and what are their consequences?Seminars in Cancer Biology, 2005
- Permutation Methods: A Basis for Exact InferenceStatistical Science, 2004
- PGC-1α-responsive genes involved in oxidative phosphorylation are coordinately downregulated in human diabetesNature Genetics, 2003
- The control of the false discovery rate in multiple testing under dependencyThe Annals of Statistics, 2001