FREM

4 November 2002

proceedings article
Published by Association for Computing Machinery (ACM)

p. 590-599
https://doi.org/10.1145/584792.584889

Abstract

Clustering is a fundamental Data Mining technique. This article presents an improved EM algorithm to cluster large data sets having high dimensionality, noise and zero variance problems. The algorithm incorporates improvements to increase the quality of solutions and speed. In general the algorithm can find a good clustering solution in 3 scans over the data set. Alternatively, it can be run until it converges. The algorithm has a few parameters that are easy to set and have defaults for most cases. The proposed algorithm is compared against the standard EM algorithm and the On-Line EM algorithm.

Keywords

This publication has 20 references indexed in Scilit:

Data bubbles
Published by Association for Computing Machinery (ACM) ,2001
Outlier detection for high dimensional data
Published by Association for Computing Machinery (ACM) ,2001
Scalability for clustering algorithms revisited
ACM SIGKDD Explorations Newsletter, 2000
SQLEM
Published by Association for Computing Machinery (ACM) ,2000
Finding generalized projected clusters in high dimensional spaces
Published by Association for Computing Machinery (ACM) ,2000
When Is “Nearest Neighbor” Meaningful?
Published by Springer Nature ,1999
The KDD process for extracting useful knowledge from volumes of data
Communications of the ACM, 1996
Hierarchical Mixtures of Experts and the EM Algorithm
Neural Computation, 1994
Statistical Physics, Mixtures of Distributions, and the EM Algorithm
Neural Computation, 1994
Mixture Densities, Maximum Likelihood and the EM Algorithm
SIAM Review, 1984