Clustering Objects on Subsets of Attributes (with Discussion)
Open Access
- 13 October 2004
- journal article
- research article
- Published by Oxford University Press (OUP) in Journal of the Royal Statistical Society Series B: Statistical Methodology
- Vol. 66 (4) , 815-849
- https://doi.org/10.1111/j.1467-9868.2004.02059.x
Abstract
Summary. A new procedure is proposed for clustering attribute value data. When used in conjunction with conventional distance-based clustering algorithms this procedure encourages those algorithms to detect automatically subgroups of objects that preferentially cluster on subsets of the attribute variables rather than on all of them simultaneously. The relevant attribute subsets for each individual cluster can be different and partially (or completely) overlap with those of other clusters. Enhancements for increasing sensitivity for detecting especially low cardinality groups clustering on a small subset of variables are discussed. Applications in different domains, including gene expression arrays, are presented.Keywords
Funding Information
- Department of Energy (DE-AC03-76SF00515)
- National Science Foundation (DMS-97-64431)
This publication has 25 references indexed in Scilit:
- Weighting and selection of variables for cluster analysisJournal of Classification, 1995
- Estimating normal means with a conjugate style dirichlet process priorCommunications in Statistics - Simulation and Computation, 1994
- Model-Based Gaussian and Non-Gaussian ClusteringPublished by JSTOR ,1993
- A validation study of a variable weighting algorithm for cluster analysisJournal of Classification, 1989
- Variable selection in clusteringJournal of Classification, 1988
- OVWTRE: A program for optimal variable weighting for ultrametric and additive tree fittingJournal of Classification, 1988
- Optimal variable weighting for hierarchical clustering: An alternating least-squares algorithmJournal of Classification, 1985
- A Case Study of two Clustering Methods based on Maximum LikelihoodStatistica Neerlandica, 1979
- A General Coefficient of Similarity and Some of Its PropertiesPublished by JSTOR ,1971
- Estimating the components of a mixture of normal distributionsBiometrika, 1969