A multivariate approach for integrating genome-wide expression data and biological knowledge
Open Access
- 28 July 2006
- journal article
- Published by Oxford University Press (OUP) in Bioinformatics
- Vol. 22 (19) , 2373-2380
- https://doi.org/10.1093/bioinformatics/btl401
Abstract
Motivation: Several statistical methods that combine analysis of differential gene expression with biological knowledge databases have been proposed for a more rapid interpretation of expression data. However, most such methods are based on a series of univariate statistical tests and do not properly account for the complex structure of gene interactions. Results: We present a simple yet effective multivariate statistical procedure for assessing the correlation between a subspace defined by a group of genes and a binary phenotype. A subspace is deemed significant if the samples corresponding to different phenotypes are well separated in that subspace. The separation is measured using Hotelling's T2 statistic, which captures the covariance structure of the subspace. When the dimension of the subspace is larger than that of the sample space, we project the original data to a smaller orthonormal subspace. We use this method to search through functional pathway subspaces defined by Reactome, KEGG, BioCarta and Gene Ontology. To demonstrate its performance, we apply this method to the data from two published studies, and visualize the results in the principal component space. Contact: peter_park@harvard.eduKeywords
This publication has 39 references indexed in Scilit:
- Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profilesProceedings of the National Academy of Sciences, 2005
- Hotelling's T2 multivariate profiling for detecting differential expression in microarraysBioinformatics, 2005
- Statistical methods of translating microarray data into clinically relevant diagnostic information in colorectal cancerBioinformatics, 2004
- Multivariate exploratory tools for microarray data analysisBiostatistics, 2003
- Is Nuclear Factor κB an Attractive Therapeutic Target for Treating Cardiac Hypertrophy?Circulation, 2003
- Statistical significance for genomewide studiesProceedings of the National Academy of Sciences, 2003
- PGC-1α-responsive genes involved in oxidative phosphorylation are coordinately downregulated in human diabetesNature Genetics, 2003
- The Predictive Toxicology Challenge 2000–2001Bioinformatics, 2001
- Regularized Discriminant AnalysisJournal of the American Statistical Association, 1989
- Regularized Discriminant AnalysisJournal of the American Statistical Association, 1989