Variable Selection for Model‐Based High‐Dimensional Clustering and Its Application to Microarray Data

28 June 2008

journal article
Published by Oxford University Press (OUP) in Biometrics

Vol. 64 (2) , 440-448
https://doi.org/10.1111/j.1541-0420.2007.00922.x

Abstract

SummaryVariable selection in high‐dimensional clustering analysis is an important yet challenging problem. In this article, we propose two methods that simultaneously separate data points into similar clusters and select informative variables that contribute to the clustering. Our methods are in the framework of penalized model‐based clustering. Unlike the classicalL₁‐norm penalization, the penalty terms that we propose make use of the fact that parameters belonging to one variable should be treated as a natural “group.” Numerical results indicate that the two new methods tend to remove noninformative variables more effectively and provide better clustering results than theL₁‐norm approach.

Keywords

This publication has 19 references indexed in Scilit:

Adaptive Lasso for Cox's proportional hazards model
Biometrika, 2007
The Adaptive Lasso and Its Oracle Properties
Journal of the American Statistical Association, 2006
Variable Selection for Model-Based Clustering
Journal of the American Statistical Association, 2006
Bayesian Variable Selection in Clustering High-Dimensional Data
Journal of the American Statistical Association, 2005
Model-Based Clustering, Discriminant Analysis, and Density Estimation
Journal of the American Statistical Association, 2002
Adaptive Model Selection
Journal of the American Statistical Association, 2002
Better Subset Regression Using the Nonnegative Garrote
Technometrics, 1995
Better Subset Regression Using the Nonnegative Garrote
Technometrics, 1995
Maximum likelihood estimation via the ECM algorithm: A general framework
Biometrika, 1993
Estimating the Dimension of a Model
The Annals of Statistics, 1978