Semi-supervised learning via penalized mixture model with application to microarray sample classification
Open Access
- 26 July 2006
- journal article
- research article
- Published by Oxford University Press (OUP) in Bioinformatics
- Vol. 22 (19) , 2388-2395
- https://doi.org/10.1093/bioinformatics/btl393
Abstract
Motivation: It is biologically interesting to address whether human blood outgrowth endothelial cells (BOECs) belong to or are closer to large vessel endothelial cells (LVECs) or microvascular endothelial cells (MVECs) based on global expression profiling. An earlier analysis using a hierarchical clustering and a small set of genes suggested that BOECs seemed to be closer to MVECs. By taking advantage of the two known classes, LVEC and MVEC, while allowing BOEC samples to belong to either of the two classes or to form their own new class, we take a semi-supervised learning approach; for high-dimensional data as encountered here, we propose a penalized mixture model with a weighted L1 penalty to realize automatic feature selection while fitting the model. Results: We applied our penalized mixture model to a combined dataset containing 27 BOEC, 28 LVEC and 25 MVEC samples. Analysis results indicated that the BOEC samples appeared to form their own new class. A simulation study confirmed that, compared with the standard mixture model with or without initial variable selection, the penalized mixture model performed much better in identifying relevant genes and forming corresponding clusters. The penalized mixture model seems to be promising for high-dimensional data with the capability of novel class discovery and automatic feature selection. Contact:weip@biostat.umn.eduKeywords
This publication has 24 references indexed in Scilit:
- Tight Clustering: A Resampling‐Based Approach for Identifying Stable and Tight Patterns in DataBiometrics, 2005
- A mixture model-based strategy for selecting sets of genes in multiclass response microarray experimentsBioinformatics, 2004
- Class discovery and classification of tumor samples using mixture modeling of gene expression data—a unified approachBioinformatics, 2004
- Exploration, normalization, and summaries of high density oligonucleotide array probe level dataBiostatistics, 2003
- Bayesian Hierarchical Model for Identifying Changes in Gene Expression from Microarray ExperimentsJournal of Computational Biology, 2002
- Adaptive Model SelectionJournal of the American Statistical Association, 2002
- Significance analysis of microarrays applied to the ionizing radiation responseProceedings of the National Academy of Sciences, 2001
- Origins of circulating endothelial cells and endothelial outgrowth from bloodJournal of Clinical Investigation, 2000
- How Many Clusters? Which Clustering Method? Answers Via Model-Based Cluster AnalysisThe Computer Journal, 1998
- Estimating the Dimension of a ModelThe Annals of Statistics, 1978