A variational Bayesian mixture modelling framework for cluster analysis of gene-expression data
Open Access
- 28 April 2005
- journal article
- research article
- Published by Oxford University Press (OUP) in Bioinformatics
- Vol. 21 (13) , 3025-3033
- https://doi.org/10.1093/bioinformatics/bti466
Abstract
Motivation: Accurate subcategorization of tumour types through gene-expression profiling requires analytical techniques that estimate the number of categories or clusters rigorously and reliably. Parametric mixture modelling provides a natural setting to address this problem. Results: We compare a criterion for model selection that is derived from a variational Bayesian framework with a popular alternative based on the Bayesian information criterion. Using simulated data, we show that the variational Bayesian method is more accurate in finding the true number of clusters in situations that are relevant to current and future microarray studies. We also compare the two criteria using freely available tumour microarray datasets and show that the variational Bayesian method is more sensitive to capturing biologically relevant structure. Availability: We have developed an R-package vabayelMix, available from www.cran.r-project.org, that implements the algorithm described in this paper. Contact:aet21@cam.ac.uk Supplementary information: http://bioinformatics.oxfordjournals.orgKeywords
This publication has 19 references indexed in Scilit:
- PyEvolve: a toolkit for statistical modelling of molecular evolutionBMC Bioinformatics, 2004
- Breast cancer classification and prognosis based on gene expression profiles from a population-based studyProceedings of the National Academy of Sciences, 2003
- Repeated observation of breast tumor subtypes in independent gene expression data setsProceedings of the National Academy of Sciences, 2003
- A decomposition model to track gene expression signatures: preview on observer-independent classification of ovarian cancerBioinformatics, 2002
- Methods for assessing reproducibility of clustering patterns observed in analyses of microarray dataBioinformatics, 2002
- A mixture model-based approach to the clustering of microarray expression dataBioinformatics, 2002
- Gene expression profiling predicts clinical outcome of breast cancerNature, 2002
- Distinct types of diffuse large B-cell lymphoma identified by gene expression profilingNature, 2000
- A Criterion for Determining the Number of Groups in a Data Set Using Sum-of-Squares ClusteringPublished by JSTOR ,1988
- Estimating the Dimension of a ModelThe Annals of Statistics, 1978