Extracting principal diagnosis, co-morbidity and smoking status for asthma research: evaluation of a natural language processing system
Top Cited Papers
Open Access
- 26 July 2006
- journal article
- research article
- Published by Springer Nature in BMC Medical Informatics and Decision Making
- Vol. 6 (1) , 30
- https://doi.org/10.1186/1472-6947-6-30
Abstract
The text descriptions in electronic medical records are a rich source of information. We have developed a Health Information Text Extraction (HITEx) tool and used it to extract key findings for a research study on airways disease. The principal diagnosis, co-morbidity and smoking status extracted by HITEx from a set of 150 discharge summaries were compared to an expert-generated gold standard. The accuracy of HITEx was 82% for principal diagnosis, 87% for co-morbidity, and 90% for smoking status extraction, when cases labeled "Insufficient Data" by the gold standard were excluded. We consider the results promising, given the complexity of the discharge summaries and the extraction tasks.Keywords
This publication has 28 references indexed in Scilit:
- Accuracy of Mild Traumatic Brain Injury Case Ascertainment Using ICD-9 CodesAcademic Emergency Medicine, 2006
- Accuracy of Mild Traumatic Brain Injury Case Ascertainment Using ICD‐9 CodesAcademic Emergency Medicine, 2006
- Measuring Diagnoses: ICD Code AccuracyHealth Services Research, 2005
- Extracting information on pneumonia in infants using natural language processing of radiology reportsJournal of Biomedical Informatics, 2005
- Improved Identification of Noun Phrases in Clinical Radiology Reports Using a High-Performance Statistical Natural Language Parser Augmented with the UMLS Specialist LexiconJournal of the American Medical Informatics Association, 2005
- Automated Encoding of Clinical Documents Based on Natural Language ProcessingJournal of the American Medical Informatics Association, 2004
- Accuracy of self-reported smoking status among participants in a chemoprevention trialPreventive Medicine, 2004
- Cotinine levels and self-reported smoking status in patients attending a bronchoscopy clinicBiomarkers, 2003
- A Simple Algorithm for Identifying Negated Findings and Diseases in Discharge SummariesJournal of Biomedical Informatics, 2001
- Text Chunking Using Transformation-Based LearningPublished by Springer Nature ,1999