Skill Tests and Parametric Statistics for Model Evaluation

Abstract
A series of traditional statistical methods and nonparametric skill tests are assembled into a model statistical analysis program (MSAP) with the intent of evaluating their ability to verify surface water models. Using two different hydrodynamic models, simulations of two Lake Erie storm surges are performed, with actual water levels for each storm surge obtained from monitoring stations surrounding the lake. The actual and computed water levels are analyzed by MSAP. Of the three major categories of statistical tests contained in MSAP, the skill tests are superior in demonstrating a model's ability or inability to predict the most important aspect of a storm‐surge simulation: its creation and die‐off. Time‐series statistical calculations such as correlation coefficients and root‐mean‐square deviation values are helpful only in assessing gross overall model performance. Statistical decision‐making tests, based on the acceptance or rejection of certain hypotheses within a specified level of significance, yield overly optimistic or incorrect results due to the violation of several assumptions necessary for proper use of the tests.