Testing Homogeneity in a Mixture Distribution via theL2Distance Between Competing Models
- 1 June 2004
- journal article
- Published by Taylor & Francis in Journal of the American Statistical Association
- Vol. 99 (466) , 488-498
- https://doi.org/10.1198/016214504000000494
Abstract
Ascertaining the number of components in a mixture distribution is an interesting and challenging problem for statisticians. Chen, Chen, and Kalbfleisch recently proposed a modified likelihood ratio test (MLRT), which is distribution-free and locally most powerful, asymptotically. In this article we present a new method for testing whether a finite mixture distribution is homogeneous. Our method, the D test, is based on the L2 distance between a fitted homogeneous model and a fitted heterogeneous model. For mixture components from standard parametric families, the D-test statistic has a closed-form expression in terms of parameter estimators, whereas likelihood ratio-type test statistics do not; the latter test statistics are nontrivial functions of both the parameter estimators and the full dataset. The convergence rates of the D-test statistic under a null hypothesis of homogeneity and an alternative hypothesis of heterogeneity are established. The D test is shown to be competitive with the MLRT when the mixture components come from a normal location family. However, in the exponential scale and normal location/scale cases, the relative performances of the D test and the MLRT are mixed. In cases such as these two, we propose to use a weighted D test, in which the measure underlying the L2 distance is changed to accentuate the disparities between the homogeneous and heterogeneous models. Changing the measure is equivalent to computing the D-test statistic using a weighting function or to transforming the data before conducting the D test. Appropriately weighted D tests are competitive in both the exponential scale and normal location/scale cases. After applying the D test to a dataset in which the observations are measurements of firms' financial performances, we conclude with discussion and remarks.Keywords
This publication has 3 references indexed in Scilit:
- Testing the number of components in a normal mixtureBiometrika, 2001
- Tail Probabilities of the Maxima of Gaussian Random FieldsThe Annals of Probability, 1993
- Discrete Parameter Variation: Efficient Estimation of a Switching Regression ModelEconometrica, 1978