Variable Selection for Model‐Based High‐Dimensional Clustering and Its Application to Microarray Data
- 28 June 2008
- journal article
- Published by Oxford University Press (OUP) in Biometrics
- Vol. 64 (2) , 440-448
- https://doi.org/10.1111/j.1541-0420.2007.00922.x
Abstract
SummaryVariable selection in high‐dimensional clustering analysis is an important yet challenging problem. In this article, we propose two methods that simultaneously separate data points into similar clusters and select informative variables that contribute to the clustering. Our methods are in the framework of penalized model‐based clustering. Unlike the classicalL1‐norm penalization, the penalty terms that we propose make use of the fact that parameters belonging to one variable should be treated as a natural “group.” Numerical results indicate that the two new methods tend to remove noninformative variables more effectively and provide better clustering results than theL1‐norm approach.Keywords
This publication has 19 references indexed in Scilit:
- Adaptive Lasso for Cox's proportional hazards modelBiometrika, 2007
- The Adaptive Lasso and Its Oracle PropertiesJournal of the American Statistical Association, 2006
- Variable Selection for Model-Based ClusteringJournal of the American Statistical Association, 2006
- Bayesian Variable Selection in Clustering High-Dimensional DataJournal of the American Statistical Association, 2005
- Model-Based Clustering, Discriminant Analysis, and Density EstimationJournal of the American Statistical Association, 2002
- Adaptive Model SelectionJournal of the American Statistical Association, 2002
- Better Subset Regression Using the Nonnegative GarroteTechnometrics, 1995
- Better Subset Regression Using the Nonnegative GarroteTechnometrics, 1995
- Maximum likelihood estimation via the ECM algorithm: A general frameworkBiometrika, 1993
- Estimating the Dimension of a ModelThe Annals of Statistics, 1978