Repeated split sample validation to assess logistic regression and recursive partitioning: an application to the prediction of cognitive impairment
- 8 September 2005
- journal article
- research article
- Published by Wiley in Statistics in Medicine
- Vol. 24 (19) , 3019-3035
- https://doi.org/10.1002/sim.2154
Abstract
Screening strategies play an important part in the identification and diagnosis of illness. Testing of such strategies in a clinical trial can have important implications for the treatment of such illnesses. Before the clinical trial, however, it is important to develop a practical screening/classification procedure that accurately predicts the presence of the illness in question. Recent published studies have shown a growing preference for classification tree/recursive partitioning procedures. This paper compares the application of logistic regression and recursive partitioning to a neuropsychological data set of 252 patients recruited from four Veterans Affairs Medical Centers. Logistic regression and recursive partitioning was used to predict cognitive impairment in 12 randomly selected exploratory/validation samples. We assessed the effect of sampling on variable selection and predictive accuracy. Predictive accuracy of the logistic regression and recursive partitioning procedures was comparable across the exploratory data samples but varied across the validation samples. Based on shrinkage, both classification procedures performed equally well for the prediction of cognitive impairment across the twelve samples. While logistic regression provided an estimated probability of outcome for each patient, it required several mathematical calculations to do so. However, logistic regression selected one or two less predictors than recursive partitioning with comparable predictive accuracy. Recursive partitioning, on the other hand, readily identified patient characteristics and variable interactions, was easy to interpret clinically and required no mathematical calculations. There was a high degree of overlap of the predictor variables between the two procedures. In the context of neuropsychological screening, logistic regression and recursive partitioning performed equally well and were quite stable in the selection of predictors for the identification of patients with cognitive impairment, although recursive partitioning may be easier to use in a clinical setting because it is based on a simple decision tree. Copyright © 2005 John Wiley & Sons, Ltd.Keywords
This publication has 17 references indexed in Scilit:
- SmcHD1, containing a structural-maintenance-of-chromosomes hinge domain, has a critical role in X inactivationNature Genetics, 2008
- Do logistic regression and signal detection identify different subgroups at risk? Implications for the design of tailored interventionsPsychological Methods, 2001
- A Comparison of Logistic Regression to Decision Tree Induction in the Diagnosis of Carpal Tunnel SyndromeComputers and Biomedical Research, 1999
- Brief report. Classification trees and logistic regression applied to prognostic studies: a comparison using meningococcal disease as an exampleJournal of Tropical Pediatrics, 1999
- Classification and regression trees (CART) for prediction of function at 1 year following head traumaJournal of Neurosurgery, 1995
- Predicting stroke inpatient rehabilitation outcome using a classification tree approachArchives of Physical Medicine and Rehabilitation, 1994
- Statistical methods in diagnosisStatistical Methods in Medical Research, 1992
- Statistical approaches to classification: Methods for developing classification and other criteria rulesArthritis & Rheumatism, 1990
- The results of logistic analyses when the variables are highly correlated: An empirical example using diet and CHD incidenceJournal of Chronic Diseases, 1984
- Rank-Biserial CorrelationPsychometrika, 1956