Incorporating gene functions as priors in model-based clustering of microarray gene expression data
Open Access
- 24 January 2006
- journal article
- research article
- Published by Oxford University Press (OUP) in Bioinformatics
- Vol. 22 (7) , 795-801
- https://doi.org/10.1093/bioinformatics/btl011
Abstract
Motivation: Cluster analysis of gene expression profiles has been widely applied to clustering genes for gene function discovery. Many approaches have been proposed. The rationale is that the genes with the same biological function or involved in the same biological process are more likely to co-express, hence they are more likely to form a cluster with similar gene expression patterns. However, most existing methods, including model-based clustering, ignore known gene functions in clustering. Results: To take advantage of accumulating gene functional annotations, we propose incorporating known gene functions as prior probabilities in model-based clustering. In contrast to a global mixture model applicable to all the genes in the standard model-based clustering, we use a stratified mixture model: one stratum corresponds to the genes of unknown function while each of the other ones corresponding to the genes sharing the same biological function or pathway; the genes from the same stratum are assumed to have the same prior probability of coming from a cluster while those from different strata are allowed to have different prior probabilities of coming from the same cluster. We derive a simple EM algorithm that can be used to fit the stratified model. A simulation study and an application to gene function prediction demonstrate the advantage of our proposal over the standard method. Contact:weip@biostat.umn.eduKeywords
This publication has 44 references indexed in Scilit:
- Knowledge guided analysis of microarray dataJournal of Biomedical Informatics, 2006
- Ontological analysis of gene expression data: current tools, limitations, and open problemsBioinformatics, 2005
- Tight Clustering: A Resampling‐Based Approach for Identifying Stable and Tight Patterns in DataBiometrics, 2005
- Incorporating Biological Information as a Prior in an Empirical Bayes Approach to Analyzing Microarray DataStatistical Applications in Genetics and Molecular Biology, 2005
- Class discovery and classification of tumor samples using mixture modeling of gene expression data—a unified approachBioinformatics, 2004
- PGC-1α-responsive genes involved in oxidative phosphorylation are coordinately downregulated in human diabetesNature Genetics, 2003
- Transitive functional annotation by shortest-path analysis of gene expression dataProceedings of the National Academy of Sciences, 2002
- Model-Based Clustering, Discriminant Analysis, and Density EstimationJournal of the American Statistical Association, 2002
- How Many Clusters? Which Clustering Method? Answers Via Model-Based Cluster AnalysisThe Computer Journal, 1998
- Estimating the Dimension of a ModelThe Annals of Statistics, 1978