Module networks revisited: computational assessment and prioritization of model predictions

Preprint

12 January 2009

preprint
Published by arXiv in arXiv

https://doi.org/10.48550/arXiv.0901.1544

Abstract

The solution of high-dimensional inference and prediction problems in computational biology is almost always a compromise between mathematical theory and practical constraints such as limited computational resources. As time progresses, computational power increases but well-established inference methods often remain locked in their initial suboptimal solution. We revisit the approach of Segal et al. (2003) to infer regulatory modules and their condition-specific regulators from gene expression data. In contrast to their direct optimization-based solution we use a more representative centroid-like solution extracted from an ensemble of possible statistical models to explain the data. The ensemble method automatically selects a subset of most informative genes and builds a quantitatively better model for them. Genes which cluster together in the majority of models produce functionally more coherent modules. Regulators which are consistently assigned to a module are more often supported by literature, but a single model always contains many regulator assignments not supported by the ensemble. Reliably detecting condition-specific or combinatorial regulation is particularly hard in a single optimum but can be achieved using ensemble averaging.

Keywords

All Related Versions

Version 1, 2009-01-12, ArXiv
Published version: Bioinformatics, 25 (4), 490.

This publication has 0 references indexed in Scilit: