Properties ofR2 statistics for logistic regression
- 29 July 2005
- journal article
- research article
- Published by Wiley in Statistics in Medicine
- Vol. 25 (8) , 1383-1395
- https://doi.org/10.1002/sim.2300
Abstract
Various R2 statistics have been proposed for logistic regression to quantify the extent to which the binary response can be predicted by a given logistic regression model and covariates. We study the asymptotic properties of three popular variance-based R2 statistics. We find that two variance-based R2 statistics, the sum of squares and the squared Pearson correlation, have identical asymptotic distribution whereas the third one, Gini's concentration measure, has a different asymptotic behaviour and may overstate the predictivity of the model and covariates when the model is misspecified. Our result not only provides a theoretical basis for the findings in previous empirical and numerical work, but also leads to asymptotic confidence intervals. Statistical variability can then be taken into account when assessing the predictive value of a logistic regression model. Copyright © 2005 John Wiley & Sons, Ltd.Keywords
This publication has 15 references indexed in Scilit:
- Predictive accuracy and explained variationStatistics in Medicine, 2003
- Explained Variation for Logistic Regression – Small Sample Adjustments, Confidence Intervals and Predictive PrecisionBiometrical Journal, 2002
- EXPLAINED VARIATION FOR LOGISTIC REGRESSIONStatistics in Medicine, 1996
- The Relation of Maternal Complications to Outcomes in Very Low Birthweight Infants in an Era of Changing Neonatal CareAmerican Journal of Perinatology, 1996
- A new radiographic scoring system for bronchopulmonary dysplasiaPediatric Pulmonology, 1994
- Multivariate assessment of traditional risk factors for chronic lung disease in very low birth weight neonatesThe Journal of Pediatrics, 1991
- Predictive value of statistical modelsStatistics in Medicine, 1990
- On the Interpretation and Use of R 2 in Regression AnalysisPublished by JSTOR ,1987
- Information gain and a general measure of correlationBiometrika, 1983
- Maximum Likelihood Estimation of Misspecified ModelsEconometrica, 1982