An empirical Bayes approach to inferring large-scale gene association networks
Top Cited Papers
- 12 October 2004
- journal article
- research article
- Published by Oxford University Press (OUP) in Bioinformatics
- Vol. 21 (6) , 754-764
- https://doi.org/10.1093/bioinformatics/bti062
Abstract
Genetic networks are often described statistically using graphical models (e.g. Bayesian networks). However, inferring the network structure offers a serious challenge in microarray analysis where the sample size is small compared to the number of considered genes. This renders many standard algorithms for graphical models inapplicable, and inferring genetic networks an 'ill-posed' inverse problem. We introduce a novel framework for small-sample inference of graphical models from gene expression data. Specifically, we focus on the so-called graphical Gaussian models (GGMs) that are now frequently used to describe gene association networks and to detect conditionally dependent genes. Our new approach is based on (1) improved (regularized) small-sample point estimates of partial correlation, (2) an exact test of edge inclusion with adaptive estimation of the degree of freedom and (3) a heuristic network search based on false discovery rate multiple testing. Steps (2) and (3) correspond to an empirical Bayes estimate of the network topology. Using computer simulations, we investigate the sensitivity (power) and specificity (true negative rate) of the proposed framework to estimate GGMs from microarray data. This shows that it is possible to recover the true network topology with high accuracy even for small-sample datasets. Subsequently, we analyze gene expression data from a breast cancer tumor study and illustrate our approach by inferring a corresponding large-scale gene association network for 3883 genes.Keywords
This publication has 47 references indexed in Scilit:
- Large-Scale Simultaneous Hypothesis TestingJournal of the American Statistical Association, 2004
- Network biology: understanding the cell's functional organizationNature Reviews Genetics, 2004
- Statistical significance for genomewide studiesProceedings of the National Academy of Sciences, 2003
- Module networks: identifying regulatory modules and their condition-specific regulators from gene expression dataNature Genetics, 2003
- Robbins, empirical Bayes and microarraysThe Annals of Statistics, 2003
- A Direct Approach to False Discovery RatesJournal of the Royal Statistical Society Series B: Statistical Methodology, 2002
- Using Bayesian Networks to Analyze Expression DataJournal of Computational Biology, 2000
- On the Adaptive Control of the False Discovery Rate in Multiple Testing With Independent StatisticsJournal of Educational and Behavioral Statistics, 2000
- Expected classification error of the Fisher linear classifier with pseudo-inverse covariance matrixPattern Recognition Letters, 1998
- Regularized Discriminant AnalysisJournal of the American Statistical Association, 1989