Gene Expression Prediction by Soft Integration and the Elastic Net—Best Performance of the DREAM3 Gene Expression Challenge
Open Access
- 16 February 2010
- journal article
- research article
- Published by Public Library of Science (PLoS) in PLOS ONE
- Vol. 5 (2) , e9134
- https://doi.org/10.1371/journal.pone.0009134
Abstract
To predict gene expressions is an important endeavour within computational systems biology. It can both be a way to explore how drugs affect the system, as well as providing a framework for finding which genes are interrelated in a certain process. A practical problem, however, is how to assess and discriminate among the various algorithms which have been developed for this purpose. Therefore, the DREAM project invited the year 2008 to a challenge for predicting gene expression values, and here we present the algorithm with best performance. We develop an algorithm by exploring various regression schemes with different model selection procedures. It turns out that the most effective scheme is based on least squares, with a penalty term of a recently developed form called the “elastic net”. Key components in the algorithm are the integration of expression data from other experimental conditions than those presented for the challenge and the utilization of transcription factor binding data for guiding the inference process towards known interactions. Of importance is also a cross-validation procedure where each form of external data is used only to the extent it increases the expected performance. Our algorithm proves both the possibility to extract information from large-scale expression data concerning prediction of gene levels, as well as the benefits of integrating different data sources for improving the inference. We believe the former is an important message to those still hesitating on the possibilities for computational approaches, while the latter is part of an important way forward for the future development of the field of computational systems biology.Keywords
This publication has 27 references indexed in Scilit:
- A Top-Performing Algorithm for the DREAM3 Gene Expression Prediction ChallengePLOS ONE, 2010
- Reverse Engineering of Gene Networks with LASSO and Nonlinear Basis FunctionsAnnals of the New York Academy of Sciences, 2009
- YEASTRACT-DISCOVERER: new tools to improve the analysis of transcriptional regulatory associations in Saccharomyces cerevisiaeNucleic Acids Research, 2007
- NCBI GEO: mining tens of millions of expression profiles--database and tools updateNucleic Acids Research, 2006
- Constructing and Analyzing a Large-Scale Gene-to-Gene Regulatory Network-Lasso-Constrained Inference and Biological ValidationIEEE/ACM Transactions on Computational Biology and Bioinformatics, 2005
- Reverse Engineering Galactose Regulation in Yeast through Model SelectionStatistical Applications in Genetics and Molecular Biology, 2005
- Network biology: understanding the cell's functional organizationNature Reviews Genetics, 2004
- Modeling and Simulation of Genetic Regulatory Systems: A Literature ReviewJournal of Computational Biology, 2002
- Gene Expression Omnibus: NCBI gene expression and hybridization array data repositoryNucleic Acids Research, 2002
- Functional Discovery via a Compendium of Expression ProfilesCell, 2000