FREM
- 4 November 2002
- proceedings article
- Published by Association for Computing Machinery (ACM)
- p. 590-599
- https://doi.org/10.1145/584792.584889
Abstract
Clustering is a fundamental Data Mining technique. This article presents an improved EM algorithm to cluster large data sets having high dimensionality, noise and zero variance problems. The algorithm incorporates improvements to increase the quality of solutions and speed. In general the algorithm can find a good clustering solution in 3 scans over the data set. Alternatively, it can be run until it converges. The algorithm has a few parameters that are easy to set and have defaults for most cases. The proposed algorithm is compared against the standard EM algorithm and the On-Line EM algorithm.Keywords
This publication has 20 references indexed in Scilit:
- Data bubblesPublished by Association for Computing Machinery (ACM) ,2001
- Outlier detection for high dimensional dataPublished by Association for Computing Machinery (ACM) ,2001
- Scalability for clustering algorithms revisitedACM SIGKDD Explorations Newsletter, 2000
- SQLEMPublished by Association for Computing Machinery (ACM) ,2000
- Finding generalized projected clusters in high dimensional spacesPublished by Association for Computing Machinery (ACM) ,2000
- When Is “Nearest Neighbor” Meaningful?Published by Springer Nature ,1999
- The KDD process for extracting useful knowledge from volumes of dataCommunications of the ACM, 1996
- Hierarchical Mixtures of Experts and the EM AlgorithmNeural Computation, 1994
- Statistical Physics, Mixtures of Distributions, and the EM AlgorithmNeural Computation, 1994
- Mixture Densities, Maximum Likelihood and the EM AlgorithmSIAM Review, 1984