PRESS-related statistics

Abstract
HOLIDAY, D. B., J. E. BALLARD, and B. C. McKEOWN. PRESS-related statistics: regression tools for cross-validation and case diagnostics. Med. Sci. Sports Exerc., Vol. 27, No. 4, pp. 612–620, 1995. In the health science literature, a common approach of validating a regression equation is data-splitting, where a portion of the data fits the model (fitting sample) and the remainder (validation sample) estimates future performance. The R2 and SEE obtained by predicting the validation sample with the fitting sample equation is a proper estimate of future performance, tending to correct for the natural upward bias of the R2 and SEE obtained from fitting sample alone. Data-splitting has several disadvantages, however. These include: 1) difficulty, arbitrariness, and inconvenience of matching samples; 2) the need to report two sets of statistics to determine homogeneity; and 3) the lack of equation stability due to diluted sample size. The PRESS statistic and associated residuals do not require the data to be split, yield alternative unbiased estimates of R2 and SEE, and provide useful case diagnostics. This procedure is easy to use, is widely available in modern statistical packages, but is rarely utilized. The two methods are contrasted here using a simulation from original data for predicting body density from anthropomctric measurements of a group of 117 women. The PRESS approach is particularly appropriate for smaller datascts; methods of reporting these statistics are recommended.

This publication has 0 references indexed in Scilit: