Breakdown points for maximum likelihood estimators of location–scale mixtures

Open Access

1 August 2004

journal article
Published by Institute of Mathematical Statistics in The Annals of Statistics

Vol. 32 (4) , 1313-1340
https://doi.org/10.1214/009053604000000571

Abstract

ML-estimation based on mixtures of Normal distributions is a widely used tool for cluster analysis. However, a single outlier can make the parameter estimation of at least one of the mixture components break down. Among others, the estimation of mixtures of t-distributions by McLachlan and Peel [Finite Mixture Models (2000) Wiley, New York] and the addition of a further mixture component accounting for “noise” by Fraley and Raftery [The Computer J. 41 (1998) 578–588] were suggested as more robust alternatives. In this paper, the definition of an adequate robustness measure for cluster analysis is discussed and bounds for the breakdown points of the mentioned methods are given. It turns out that the two alternatives, while adding stability in the presence of outliers of moderate size, do not possess a substantially better breakdown behavior than estimation based on Normal mixtures. If the number of clusters s is treated as fixed, r additional points suffice for all three methods to let the parameters of r clusters explode. Only in the case of r=s is this not possible for t-mixtures. The ability to estimate the number of mixture components, for example, by use of the Bayesian information criterion of Schwarz [Ann. Statist. 6 (1978) 461–464], and to isolate gross outliers as clusters of one point, is crucial for an improved breakdown behavior of all three techniques. Furthermore, a mixture of Normals with an improper uniform distribution is proposed to achieve more robustness in the case of a fixed number of components.

Keywords

All Related Versions

Version 1, 2004-10-05, ArXiv

This publication has 35 references indexed in Scilit:

Breakdown and groups
The Annals of Statistics, 2005
Nearest-Neighbor Clutter Removal for Estimating Features in Spatial Point Processes
Journal of the American Statistical Association, 1998
An entropy criterion for assessing the number of clusters in a mixture model
Journal of Classification, 1996
The Identification of Multiple Outliers
Journal of the American Statistical Association, 1993
A maximum likelihood methodology for clusterwise linear regression
Journal of Classification, 1988
A constrained em algorithm for univariate normal mixtures
Journal of Statistical Computation and Simulation, 1986
Mixture models and atypical values
Mathematical Geology, 1984
Estimating the Dimension of a Model
The Annals of Statistics, 1978
A new look at the statistical model identification
IEEE Transactions on Automatic Control, 1974
The Influence Curve and its Role in Robust Estimation
Journal of the American Statistical Association, 1974