Mixed-model coexpression: calculating gene coexpression while accounting for expression heterogeneity
Open Access
- 14 June 2011
- journal article
- research article
- Published by Oxford University Press (OUP) in Bioinformatics
- Vol. 27 (13) , i288-i294
- https://doi.org/10.1093/bioinformatics/btr221
Abstract
Motivation: The analysis of gene coexpression is at the core of many types of genetic analysis. The coexpression between two genes can be calculated by using a traditional Pearson's correlation coefficient. However, unobserved confounding effects may cause inflation of the Pearson's correlation so that uncorrelated genes appear correlated. Many general methods have been suggested, which aim to remove the effects of confounding from gene expression data. However, the residual confounding which is not accounted for by these generic correction procedures has the potential to induce correlation between genes. Therefore, a method that specifically aims to calculate gene coexpression between gene expression arrays, while accounting for confounding effects, is desirable. Results: In this article, we present a statistical model for calculating gene coexpression called mixed model coexpression (MMC), which models coexpression within a mixed model framework. Confounding effects are expected to be encoded in the matrix representing the correlation between arrays, the inter-sample correlation matrix. By conditioning on the information in the inter-sample correlation matrix, MMC is able to produce gene coexpressions that are not influenced by global confounding effects and thus significantly reduce the number of spurious coexpressions observed. We applied MMC to both human and yeast datasets and show it is better able to effectively prioritize strong coexpressions when compared to a traditional Pearson's correlation and a Pearson's correlation applied to data corrected with surrogate variable analysis (SVA). Availability: The method is implemented in the R programming language and may be found at http://genetics.cs.ucla.edu/mmc. Contact:nfurlott@cs.ucla.edu; eeskin@cs.ucla.eduKeywords
This publication has 18 references indexed in Scilit:
- A general framework for multiple testing dependenceProceedings of the National Academy of Sciences, 2008
- Accurate Discovery of Expression Quantitative Trait Loci Under Confounding From Spurious and Genuine Regulatory HotspotsGenetics, 2008
- Gene–Environment Interaction in Yeast Gene ExpressionPLoS Biology, 2008
- The properties of high-dimensional data spaces: implications for exploring gene and protein expression dataNature Reviews Cancer, 2008
- Capturing Heterogeneity in Gene Expression Studies by Surrogate Variable AnalysisPLoS Genetics, 2007
- Identifying regulatory mechanisms using individual variation reveals key role for chromatin modificationProceedings of the National Academy of Sciences, 2006
- Integrating Genetic and Network Analysis to Characterize Genes Related to Mouse WeightPLoS Genetics, 2006
- Adjusting batch effects in microarray expression data using empirical Bayes methodsBiostatistics, 2006
- Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profilesProceedings of the National Academy of Sciences, 2005
- The International HapMap ProjectNature, 2003