Evaluating Natural Language Processors in the Clinical Domain

1 October 1998

journal article
review article
Published by Georg Thieme Verlag KG in Methods of Information in Medicine

Vol. 37 (04/05) , 334-344
https://doi.org/10.1055/s-0038-1634566

Abstract

Evaluating natural language processing (NLP) systems in the clinical domain is a difficult task which is important for advancement of the field. A number of NLP systems have been reported that extract information from free-text clinical reports, but not many of the systems have been evaluated. Those that were evaluated noted good performance measures but the results were often weakened by ineffective evaluation methods. In this paper we describe a set of criteria aimed at improving the quality of NLP evaluation studies. We present an overview of NLP evaluations in the clinical domain and also discuss the Message Understanding Conferences (MUC) [1-41. Although these conferences constitute a series of NLP evaluation studies performed outside of the clinical domain, some of the results are relevant within medicine. In addition, we discuss a number of factors which contribute to the complexity that is inherent in the task of evaluating natural language systems.

Keywords

This publication has 6 references indexed in Scilit:

Development and Evaluation of a Computerized Admission Diagnoses Encoding System
Computers and Biomedical Research, 1996
Unlocking Clinical Data from Narrative Reports: A Study of Natural Language Processing
Annals of Internal Medicine, 1995
Natural language processing in an operational clinical information system
Natural Language Engineering, 1995
Design of the MUC-6 evaluation
Published by Association for Computational Linguistics (ACL) ,1995
A General Natural-language Text Processor for Clinical Radiology
Journal of the American Medical Informatics Association, 1994
Natural Language Processing and the Representation of Clinical Data
Journal of the American Medical Informatics Association, 1994