From correlation to causation networks: a simple approximate learning algorithm and its application to high-dimensional plant gene expression data
Top Cited Papers
Open Access
- 6 August 2007
- journal article
- research article
- Published by Springer Nature in BMC Systems Biology
- Vol. 1 (1) , 1-10
- https://doi.org/10.1186/1752-0509-1-37
Abstract
The use of correlation networks is widespread in the analysis of gene expression and proteomics data, even though it is known that correlations not only confound direct and indirect associations but also provide no means to distinguish between cause and effect. For "causal" analysis typically the inference of a directed graphical model is required. However, this is rather difficult due to the curse of dimensionality. We propose a simple heuristic for the statistical learning of a high-dimensional "causal" network. The method first converts a correlation network into a partial correlation graph. Subsequently, a partial ordering of the nodes is established by multiple testing of the log-ratio of standardized partial variances. This allows identifying a directed acyclic causal network as a subgraph of the partial correlation network. We illustrate the approach by analyzing a large Arabidopsis thaliana expression data set. The proposed approach is a heuristic algorithm that is based on a number of approximations, such as substituting lower order partial correlations by full order partial correlations. Nevertheless, for small samples and for sparse networks the algorithm not only yield sensible first order approximations of the causal structure in high-dimensional genomic data but is also computationally highly efficient. The method is implemented in the "GeneNet" R package (version 1.2.0), available from CRAN and from http://strimmerlab.org/software/genets/ . The software includes an R script for reproducing the network analysis of the Arabidopsis thaliana data.Keywords
This publication has 32 references indexed in Scilit:
- Learning causal networks from systems biology time course data: an effective model selection procedure for the vector autoregressive processBMC Bioinformatics, 2007
- Conservation and evolution of gene coexpression networks in human and chimpanzee brainsProceedings of the National Academy of Sciences, 2006
- Low-Order Conditional Independence Graphs for Inferring Genetic NetworksStatistical Applications in Genetics and Molecular Biology, 2006
- A Shrinkage Approach to Large-Scale Covariance Matrix Estimation and Implications for Functional GenomicsStatistical Applications in Genetics and Molecular Biology, 2005
- Diurnal Changes in the Transcriptome Encoding Enzymes of Starch Metabolism Provide Evidence for Both Transcriptional and Posttranscriptional Regulation of Starch Metabolism in Arabidopsis LeavesPlant Physiology, 2004
- Large-Scale Simultaneous Hypothesis TestingJournal of the American Statistical Association, 2004
- Network biology: understanding the cell's functional organizationNature Reviews Genetics, 2004
- Linear Dependencies Represented by Chain GraphsStatistical Science, 1993
- Collinearity and Least Squares RegressionStatistical Science, 1987
- Linear Recursive Equations, Covariance Selection, and Path AnalysisJournal of the American Statistical Association, 1980