Analysis of a Gibbs sampler method for model-based clustering of gene expression data
Open Access
- 22 November 2007
- journal article
- research article
- Published by Oxford University Press (OUP) in Bioinformatics
- Vol. 24 (2) , 176-183
- https://doi.org/10.1093/bioinformatics/btm562
Abstract
Motivation: Over the last decade, a large variety of clustering algorithms have been developed to detect coregulatory relationships among genes from microarray gene expression data. Model-based clustering approaches have emerged as statistically well-grounded methods, but the properties of these algorithms when applied to large-scale data sets are not always well understood. An in-depth analysis can reveal important insights about the performance of the algorithm, the expected quality of the output clusters, and the possibilities for extracting more relevant information out of a particular data set. Results: We have extended an existing algorithm for model-based clustering of genes to simultaneously cluster genes and conditions, and used three large compendia of gene expression data for Saccharomyces cerevisiae to analyze its properties. The algorithm uses a Bayesian approach and a Gibbs sampling procedure to iteratively update the cluster assignment of each gene and condition. For large-scale data sets, the posterior distribution is strongly peaked on a limited number of equiprobable clusterings. A GO annotation analysis shows that these local maxima are all biologically equally significant, and that simultaneously clustering genes and conditions performs better than only clustering genes and assuming independent conditions. A collection of distinct equivalent clusterings can be summarized as a weighted graph on the set of genes, from which we extract fuzzy, overlapping clusters using a graph spectral method. The cores of these fuzzy clusters contain tight sets of strongly coexpressed genes, while the overlaps exhibit relations between genes showing only partial coexpression. Availability: GaneSh, a Java package for coclustering, is available under the terms of the GNU General Public License from our website at http://bioinformatics.psb.ugent.be/software Contact: yves.vandepeer@psb.ugent.be Supplementary information: Supplementary data are available on our website at http://bioinformatics.psb.ugent.be/supplementary_data/anjos/gibbsKeywords
All Related Versions
This publication has 21 references indexed in Scilit:
- Validating module network learning algorithms using simulated dataBMC Bioinformatics, 2007
- Model-Based Clustering for Expression Data via a Dirichlet Process Mixture ModelPublished by Cambridge University Press (CUP) ,2006
- Clustering microarray gene expression data using weighted Chinese restaurant processBioinformatics, 2006
- Bayesian mixture model based clustering of replicated microarray dataBioinformatics, 2004
- Module networks: identifying regulatory modules and their condition-specific regulators from gene expression dataNature Genetics, 2003
- Judging the Quality of Gene Expression-Based Clustering Methods Using Gene AnnotationGenome Research, 2002
- Model-Based Clustering, Discriminant Analysis, and Density EstimationJournal of the American Statistical Association, 2002
- Functional Discovery via a Compendium of Expression ProfilesCell, 2000
- Sequential fuzzy cluster extraction by a graph spectral methodPattern Recognition Letters, 1999
- Switching transcription on and off during the yeast cell cycle: Cln/Cdc28 kinases activate bound transcription factor SBF (Swi4/Swi6) at start, whereas Clb/Cdc28 kinases displace it from the promoter in G2.Genes & Development, 1996