Extracting Information from Textual Documents in the Electronic Health Record: A Review of Recent Research

Top Cited Papers

Open Access

1 August 2008

journal article
review article
Published by Georg Thieme Verlag KG in Yearbook of Medical Informatics

Vol. 17 (01) , 128-144
https://doi.org/10.1055/s-0038-1638592

Abstract

Summary: Objectives We examine recent published research on the extraction of information from textual documents in the Electronic Health Record (EHR). Methods Literature review of the research published after 1995, based on PubMed, conference proceedings, and the ACM Digital Library, as well as on relevant publications referenced in papers already included. Results 174 publications were selected and are discussed in this review in terms of methods used, pre-processing of textual documents, contextual features detection and analysis, extraction of information in general, extraction of codes and of information for decision-support and enrichment of the EHR, information extraction for surveillance, research, automated terminology management, and data mining, and de-identification of clinical text. Conclusions Performance of information extraction systems with clinical text has improved since the last systematic review in 1995, but they are still rarely applied outside of the laboratory they have been developed in. Competitive challenges for information extraction from clinical text, along with the availability of annotated clinical text corpora, and further improvements in system performance are important factors to stimulate advances in this field and to increase the acceptance and usage of these systems in concrete clinical and biomedical research contexts.

Keywords

This publication has 77 references indexed in Scilit:

Evaluating the State-of-the-Art in Automatic De-identification
Journal of the American Medical Informatics Association, 2007
Frontiers of biomedical text mining: current progress
Briefings in Bioinformatics, 2007
Automated Encoding of Clinical Documents Based on Natural Language Processing
Journal of the American Medical Informatics Association, 2004
Information extraction from biomedical text
Journal of Biomedical Informatics, 2002
CREATING KNOWLEDGE REPOSITORIES FROM BIOMEDICAL REPORTS: THE MEDSYNDIKATE TEXT MINING SYSTEM
Pacific Symposium on Biocomputing, 2001
Disambiguating Ambiguous Biomedical Terms in Biomedical Narrative Text: An Unsupervised Method
Journal of Biomedical Informatics, 2001
Untangling text data mining
Published by Association for Computational Linguistics (ACL) ,1999
Extracting Findings from Narrative Reports: Software Transferability and Sources of Physician Disagreement
Methods of Information in Medicine, 1998
The generic information extraction system
Published by Association for Computational Linguistics (ACL) ,1993
Computerized extraction of coded findings from free-text radiologic reports. Work in progress.
Radiology, 1990