Recognizing Obesity and Comorbidities in Sparse Data
Open Access
- 1 July 2009
- journal article
- research article
- Published by Oxford University Press (OUP) in Journal of the American Medical Informatics Association
- Vol. 16 (4) , 561-570
- https://doi.org/10.1197/jamia.m3115
Abstract
In order to survey, facilitate, and evaluate studies of medical language processing on clinical narratives, i2b2 (Informatics for Integrating Biology to the Bedside) organized its second challenge and workshop. This challenge focused on automatically extracting information on obesity and fifteen of its most common comorbidities from patient discharge summaries. For each patient, obesity and any of the comorbidities could be Present, Absent, or Questionable (i.e., possible) in the patient, or Unmentioned in the discharge summary of the patient. i2b2 provided data for, and invited the development of, automated systems that can classify obesity and its comorbidities into these four classes based on individual discharge summaries. This article refers to obesity and comorbidities as diseases. It refers to the categories Present, Absent, Questionable, and Unmentioned as classes. The task of classifying obesity and its comorbidities is called the Obesity Challenge. The data released by i2b2 was annotated for textual judgments reflecting the explicitly reported information on diseases, and intuitive judgments reflecting medical professionals' reading of the information presented in discharge summaries. There were very few examples of some disease classes in the data. The Obesity Challenge paid particular attention to the performance of systems on these less well-represented classes. A total of 30 teams participated in the Obesity Challenge. Each team was allowed to submit two sets of up to three system runs for evaluation, resulting in a total of 136 submissions. The submissions represented a combination of rule-based and machine learning approaches. Evaluation of system runs shows that the best predictions of textual judgments come from systems that filter the potentially noisy portions of the narratives, project dictionaries of disease names onto the remaining text, apply negation extraction, and process the text through rules. Information on disease-related concepts, such as symptoms and medications, and general medical knowledge help systems infer intuitive judgments on the diseases.Keywords
This publication has 20 references indexed in Scilit:
- Semi-automated Construction of Decision Rules to Predict Morbidities from Clinical TextsJournal of the American Medical Informatics Association, 2009
- A Text Mining Approach to the Prediction of Disease Status from Clinical Discharge SummariesJournal of the American Medical Informatics Association, 2009
- A System for Classifying Disease Comorbidity Status from Medical Discharge Summaries Using Automated Hotspot and Negated Concept DetectionJournal of the American Medical Informatics Association, 2009
- Natural Language Processing Framework to Assess Clinical ConditionsJournal of the American Medical Informatics Association, 2009
- Semantic Classification of Diseases in Discharge Summaries Using a Context-aware Rule-based ClassifierJournal of the American Medical Informatics Association, 2009
- A Rule-based Approach for Identifying Obesity and Its Comorbidities in Medical Discharge SummariesJournal of the American Medical Informatics Association, 2009
- Description of a Rule-based System for the i2b2 Challenge in Natural Language Processing for Clinical DataJournal of the American Medical Informatics Association, 2009
- Identifying Patient Smoking Status from Medical Discharge RecordsJournal of the American Medical Informatics Association, 2008
- Evaluating the State-of-the-Art in Automatic De-identificationJournal of the American Medical Informatics Association, 2007
- A Coefficient of Agreement for Nominal ScalesEducational and Psychological Measurement, 1960