Alternatives to the Chi-Square Test for Evaluating Rank Histograms from Ensemble Forecasts

1 October 2005

journal article
Published by American Meteorological Society in Weather and Forecasting

Vol. 20 (5) , 789-795
https://doi.org/10.1175/waf884.1

Abstract

Rank histograms are a commonly used tool for evaluating an ensemble forecasting system’s performance. Because the sample size is finite, the rank histogram is subject to statistical fluctuations, so a goodness-of-fit (GOF) test is employed to determine if the rank histogram is uniform to within some statistical certainty. Most often, the χ2 test is used to test whether the rank histogram is indistinguishable from a discrete uniform distribution. However, the χ2 test is insensitive to order and so suffers from troubling deficiencies that may render it unsuitable for rank histogram evaluation. As shown by examples in this paper, more powerful tests, suitable for small sample sizes, and very sensitive to the particular deficiencies that appear in rank histograms are available from the order-dependent Cramér–von Mises family of statistics, in particular, the Watson and Anderson–Darling statistics. Abstract Rank histograms are a commonly used tool for evaluating an ensemble forecasting system’s performance. Because the sample size is finite, the rank histogram is subject to statistical fluctuations, so a goodness-of-fit (GOF) test is employed to determine if the rank histogram is uniform to within some statistical certainty. Most often, the χ2 test is used to test whether the rank histogram is indistinguishable from a discrete uniform distribution. However, the χ2 test is insensitive to order and so suffers from troubling deficiencies that may render it unsuitable for rank histogram evaluation. As shown by examples in this paper, more powerful tests, suitable for small sample sizes, and very sensitive to the particular deficiencies that appear in rank histograms are available from the order-dependent Cramér–von Mises family of statistics, in particular, the Watson and Anderson–Darling statistics.

Keywords

This publication has 11 references indexed in Scilit:

Interpretation of Rank Histograms for Verifying Ensemble Forecasts
Monthly Weather Review, 2001
Objective Verification of the SAMEX ’98 Ensemble Forecasts
Monthly Weather Review, 2001
Evaluation of Eta–RSM Ensemble Probabilistic Precipitation Forecasts
Monthly Weather Review, 1998
Verification of Eta–RSM Short-Range Ensemble Forecasts
Monthly Weather Review, 1997
A Method for Producing and Evaluating Probabilistic Forecasts from Ensemble Model Integrations
Journal of Climate, 1996
Evaluating the Potential Predictive Utility of Ensemble Forecasts
Journal of Climate, 1996
Cramér‐von Mises statistics for discrete distributions
The Canadian Journal of Statistics / La Revue Canadienne de Statistique, 1994
Goodness-of-fit tests on a circle
Biometrika, 1961
Asymptotic Theory of Certain "Goodness of Fit" Criteria Based on Stochastic Processes
The Annals of Mathematical Statistics, 1952
On the composition of elementary errors
Scandinavian Actuarial Journal, 1928