Testing Homogeneity in a Mixture Distribution via theL2Distance Between Competing Models

Abstract
Ascertaining the number of components in a mixture distribution is an interesting and challenging problem for statisticians. Chen, Chen, and Kalbfleisch recently proposed a modified likelihood ratio test (MLRT), which is distribution-free and locally most powerful, asymptotically. In this article we present a new method for testing whether a finite mixture distribution is homogeneous. Our method, the D test, is based on the L2 distance between a fitted homogeneous model and a fitted heterogeneous model. For mixture components from standard parametric families, the D-test statistic has a closed-form expression in terms of parameter estimators, whereas likelihood ratio-type test statistics do not; the latter test statistics are nontrivial functions of both the parameter estimators and the full dataset. The convergence rates of the D-test statistic under a null hypothesis of homogeneity and an alternative hypothesis of heterogeneity are established. The D test is shown to be competitive with the MLRT when the mixture components come from a normal location family. However, in the exponential scale and normal location/scale cases, the relative performances of the D test and the MLRT are mixed. In cases such as these two, we propose to use a weighted D test, in which the measure underlying the L2 distance is changed to accentuate the disparities between the homogeneous and heterogeneous models. Changing the measure is equivalent to computing the D-test statistic using a weighting function or to transforming the data before conducting the D test. Appropriately weighted D tests are competitive in both the exponential scale and normal location/scale cases. After applying the D test to a dataset in which the observations are measurements of firms' financial performances, we conclude with discussion and remarks.