Evaluating uses of data mining techniques in propensity score estimation: a simulation study
Top Cited Papers
- 3 March 2008
- journal article
- research article
- Published by Wiley in Pharmacoepidemiology and Drug Safety
- Vol. 17 (6) , 546-555
- https://doi.org/10.1002/pds.1555
Abstract
Background: In propensity score modeling, it is a standard practice to optimize the prediction of exposure status based on the covariate information. In a simulation study, we examined in what situations analyses based on various types of exposure propensity score (EPS) models using data mining techniques such as recursive partitioning (RP) and neural networks (NN) produce unbiased and/or efficient results.Method: We simulated data for a hypothetical cohort study (n = 2000) with a binary exposure/outcome and 10 binary/continuous covariates with seven scenarios differing by non‐linear and/or non‐additive associations between exposure and covariates. EPS models used logistic regression (LR) (all possible main effects), RP1 (without pruning), RP2 (with pruning), and NN. We calculated c‐statistics (C), standard errors (SE), and bias of exposure‐effect estimates from outcome models for the PS‐matched dataset.Results: Data mining techniques yielded higher C than LR (mean: NN, 0.86; RPI, 0.79; RP2, 0.72; and LR, 0.76). SE tended to be greater in models with higher C. Overall bias was small for each strategy, although NN estimates tended to be the least biased. C was not correlated with the magnitude of bias (correlation coefficient [COR] = −0.3, p = 0.1) but increased SE (COR = 0.7, p < 0.001).Conclusions: Effect estimates from EPS models by simple LR were generally robust. NN models generally provided the least numerically biased estimates. C was not associated with the magnitude of bias but was with the increased SE. Copyright © 2008 John Wiley & Sons, Ltd.Keywords
This publication has 20 references indexed in Scilit:
- Variable Selection for Propensity Score ModelsAmerican Journal of Epidemiology, 2006
- A review of the application of propensity score methods yielded increasing use, advantages in specific settings, but not substantially different estimates compared with conventional multivariable methodsJournal of Clinical Epidemiology, 2005
- Propensity score methods gave similar results to traditional regression modeling in observational studies: a systematic reviewJournal of Clinical Epidemiology, 2005
- The use of the propensity score for estimating treatment effects: administrative versus clinical dataStatistics in Medicine, 2005
- Risks and Benefits of Estrogen Plus Progestin in Healthy Postmenopausal Women: Principal Results From the Women's Health Initiative Randomized Controlled TrialJAMA, 2002
- Randomized Trial of Estrogen Plus Progestin for Secondary Prevention of Coronary Heart Disease in Postmenopausal WomenJAMA, 1998
- R: A Language for Data Analysis and GraphicsJournal of Computational and Graphical Statistics, 1996
- Effects of Misspecification of the Propensity Score on Estimators of Treatment EffectPublished by JSTOR ,1993
- A Computer Protocol to Predict Myocardial Infarction in Emergency Department Patients with Chest PainNew England Journal of Medicine, 1988
- The central role of the propensity score in observational studies for causal effectsBiometrika, 1983