Context-Specific Bayesian Clustering for Gene Expression Data

1 April 2002

journal article
research article
Published by Mary Ann Liebert Inc in Journal of Computational Biology

Vol. 9 (2) , 169-191
https://doi.org/10.1089/10665270252935403

Abstract

The recent growth in genomic data and measurements of genome-wide expression patterns allows us to apply computational tools to examine gene regulation by transcription factors. In this work, we present a class of mathematical models that help in understanding the connections between transcription factors and functional classes of genes based on genetic and genomic data. Such a model represents the joint distribution of transcription factor binding sites and of expression levels of a gene in a unified probabilistic model. Learning a combined probability model of binding sites and expression patterns enables us to improve the clustering of the genes based on the discovery of putative binding sites and to detect which binding sites and experiments best characterize a cluster. To learn such models from data, we introduce a new search method that rapidly learns a model according to a Bayesian score. We evaluate our method on synthetic data as well as on real life data and analyze the biological insights it provides. Finally, we demonstrate the applicability of the method to other data analysis problems in gene expression data.

Keywords

This publication has 20 references indexed in Scilit:

Genomic Expression Programs in the Response of Yeast Cells to Environmental Changes
Molecular Biology of the Cell, 2000
Gene expression data analysis
FEBS Letters, 2000
Computational identification of Cis -regulatory elements associated with groups of functionally related genes in Saccharomyces cerevisiae 1 1Edited by F. E. Cohen
Journal of Molecular Biology, 2000
Data analysis and integration: of steps and arrows
Nature Genetics, 1999
The Transcriptional Program in the Response of Human Fibroblasts to Serum
Science, 1999
Combining evidence using p-values: application to sequence homology searches.
Bioinformatics, 1998
Bayesian Network Classifiers
Machine Learning, 1997
Efficient Approximations for the Marginal Likelihood of Bayesian Networks with Hidden Variables
Machine Learning, 1997
The EM algorithm for graphical association models with missing data
Computational Statistics & Data Analysis, 1995
Detecting Subtle Sequence Signals: a Gibbs Sampling Strategy for Multiple Alignment
Science, 1993