Clustering microarray gene expression data using weighted Chinese restaurant process
Open Access
- 9 June 2006
- journal article
- research article
- Published by Oxford University Press (OUP) in Bioinformatics
- Vol. 22 (16) , 1988-1997
- https://doi.org/10.1093/bioinformatics/btl284
Abstract
Motivation: Clustering microarray gene expression data is a powerful tool for elucidating co-regulatory relationships among genes. Many different clustering techniques have been successfully applied and the results are promising. However, substantial fluctuation contained in microarray data, lack of knowledge on the number of clusters and complex regulatory mechanisms underlying biological systems make the clustering problems tremendously challenging. Results: We devised an improved model-based Bayesian approach to cluster microarray gene expression data. Cluster assignment is carried out by an iterative weighted Chinese restaurant seating scheme such that the optimal number of clusters can be determined simultaneously with cluster assignment. The predictive updating technique was applied to improve the efficiency of the Gibbs sampler. An additional step is added during reassignment to allow genes that display complex correlation relationships such as time-shifted and/or inverted to be clustered together. Analysis done on a real dataset showed that as much as 30% of significant genes clustered in the same group display complex relationships with the consensus pattern of the cluster. Other notable features including automatic handling of missing data, quantitative measures of cluster strength and assignment confidence. Synthetic and real microarray gene expression datasets were analyzed to demonstrate its performance. Availability: A computer program named Chinese restaurant cluster (CRC) has been developed based on this algorithm. The program can be downloaded at Contact:qin@umich.edu Supplementary information:Keywords
This publication has 40 references indexed in Scilit:
- Model-Based Clustering, Discriminant Analysis, and Density EstimationJournal of the American Statistical Association, 2002
- Gene Ontology: tool for the unification of biologyNature Genetics, 2000
- MCLUST: Software for Model-Based Cluster AnalysisJournal of Classification, 1999
- Cluster analysis and display of genome-wide expression patternsProceedings of the National Academy of Sciences, 1998
- A Genome-Wide Transcriptional Analysis of the Mitotic Cell CycleMolecular Cell, 1998
- Bayesian Data AnalysisPublished by Taylor & Francis ,1995
- Model-Based Gaussian and Non-Gaussian ClusteringPublished by JSTOR ,1993
- Sampling-Based Approaches to Calculating Marginal DensitiesJournal of the American Statistical Association, 1990
- Exchangeability and related topicsPublished by Springer Nature ,1985
- A Bayesian Analysis of Some Nonparametric ProblemsThe Annals of Statistics, 1973