Prediction Modeling Using EHR Data
Top Cited Papers
- 1 June 2010
- journal article
- comparative effectiveness
- Published by Wolters Kluwer Health in Medical Care
- Vol. 48 (6) , S106-S113
- https://doi.org/10.1097/mlr.0b013e3181de9e17
Abstract
Background: Electronic health record (EHR) databases contain vast amounts of information about patients. Machine learning techniques such as Boosting and support vector machine (SVM) can potentially identify patients at high risk for serious conditions, such as heart disease, from EHR data. However, these techniques have not yet been widely tested. Objective: To model detection of heart failure more than 6 months before the actual date of clinical diagnosis using machine learning techniques applied to EHR data. To compare the performance of logistic regression, SVM, and Boosting, along with various variable selection methods in heart failure prediction. Research Design: Geisinger Clinic primary care patients with data in the EHR data from 2001 to 2006 diagnosed with heart failure between 2003 and 2006 were identified. Controls were randomly selected matched on sex, age, and clinic for this nested case-control study. Measures: Area under the curve (AUC) of receiver operator characteristic curve was computed for each method using 10-fold cross-validation. The number of variables selected by each method was compared. Results: Logistic regression with model selection based on Bayesian information criterion provided the most parsimonious model, with about 10 variables selected on average, while maintaining a high AUC (0.77 in 10-fold cross-validation). Boosting with strict variable importance threshold provided similar performance. Conclusions: Heart failure was predicted more than 6 months before clinical diagnosis, with AUC of about 0.76, using logistic regression and Boosting. These results were achieved even with strict model selection criteria. SVM had the poorest performance, possibly because of imbalanced data.Keywords
This publication has 5 references indexed in Scilit:
- BNP-Guided vs Symptom-Guided Heart Failure TherapyJAMA, 2009
- Support vector machine versus logistic regression modeling for prediction of hospital mortality in critically ill patients with haematological malignanciesBMC Medical Informatics and Decision Making, 2008
- A Feature Selection Newton Method for Support Vector Machine ClassificationComputational Optimization and Applications, 2004
- SMOTE: Synthetic Minority Over-sampling TechniqueJournal of Artificial Intelligence Research, 2002
- The Elements of Statistical LearningPublished by Springer Nature ,2001