Variable selection under multiple imputation using the bootstrap in a prognostic study
Open Access
- 13 July 2007
- journal article
- research article
- Published by Springer Nature in BMC Medical Research Methodology
- Vol. 7 (1) , 1-10
- https://doi.org/10.1186/1471-2288-7-33
Abstract
Missing data is a challenging problem in many prognostic studies. Multiple imputation (MI) accounts for imputation uncertainty that allows for adequate statistical testing. We developed and tested a methodology combining MI with bootstrapping techniques for studying prognostic variable selection. In our prospective cohort study we merged data from three different randomized controlled trials (RCTs) to assess prognostic variables for chronicity of low back pain. Among the outcome and prognostic variables data were missing in the range of 0 and 48.1%. We used four methods to investigate the influence of respectively sampling and imputation variation: MI only, bootstrap only, and two methods that combine MI and bootstrapping. Variables were selected based on the inclusion frequency of each prognostic variable, i.e. the proportion of times that the variable appeared in the model. The discriminative and calibrative abilities of prognostic models developed by the four methods were assessed at different inclusion levels. We found that the effect of imputation variation on the inclusion frequency was larger than the effect of sampling variation. When MI and bootstrapping were combined at the range of 0% (full model) to 90% of variable selection, bootstrap corrected c-index values of 0.70 to 0.71 and slope values of 0.64 to 0.86 were found. We recommend to account for both imputation and sampling variation in sets of missing data. The new procedure of combining MI with bootstrapping for variable selection, results in multivariable prognostic models with good performance and is therefore attractive to apply on data sets with missing values.Keywords
This publication has 44 references indexed in Scilit:
- The effectiveness of graded activity for low back pain in occupational healthcareOccupational and Environmental Medicine, 2006
- Comparison of imputation and modelling methods in the analysis of a physical activity trial with missing outcomesInternational Journal of Epidemiology, 2004
- Stability of multivariable fractional polynomial models with selection of variables and transformations: a bootstrap investigationStatistics in Medicine, 2003
- Prognostic modelling with logistic regression analysis: a comparison of selection and estimation methods in small data setsStatistics in Medicine, 2000
- The Job Content Questionnaire (JCQ): An instrument for internationally comparative assessments of psychosocial job characteristics.Journal of Occupational Health Psychology, 1998
- Model Selection: An Integral Part of InferencePublished by JSTOR ,1997
- Multiple Imputation after 18+ YearsJournal of the American Statistical Association, 1996
- A Fear-Avoidance Beliefs Questionnaire (FABQ) and the role of fear-avoidance beliefs in chronic low back pain and disabilityPublished by Wolters Kluwer Health ,1993
- A Prospective Study of Work Perceptions and Psychosocial Factors Affecting the Report of Back InjurySpine, 1991
- Assessment of chronic pain. I. Aspects of the reliability and validity of the visual analogue scalePain, 1983