The Role of Domain Knowledge in Automating Medical Text Report Classification

Open Access

28 March 2003

journal article
research article
Published by Oxford University Press (OUP) in Journal of the American Medical Informatics Association

Vol. 10 (4) , 330-338
https://doi.org/10.1197/jamia.m1157

Abstract

Objective: To analyze the effect of expert knowledge on the inductive learning process in creating classifiers for medical text reports. Design: The authors converted medical text reports to a structured form through natural language processing. They then inductively created classifiers for medical text reports using varying degrees and types of expert knowledge and different inductive learning algorithms. The authors measured performance of the different classifiers as well as the costs to induce classifiers and acquire expert knowledge. Measurements: The measurements used were classifier performance, training-set size efficiency, and classifier creation cost. Results: Expert knowledge was shown to be the most significant factor affecting inductive learning performance, outweighing differences in learning algorithms. The use of expert knowledge can affect comparisons between learning algorithms. This expert knowledge may be obtained and represented separately as knowledge about the clinical task or about the data representation used. The benefit of the expert knowledge is more than that of inductive learning itself, with less cost to obtain. Conclusion: For medical text report classification, expert knowledge acquisition is more significant to performance and more cost-effective to obtain than knowledge discovery. Building classifiers should therefore focus more on acquiring knowledge from experts than trying to learn this knowledge inductively.

Keywords

This publication has 27 references indexed in Scilit:

A Comparison of Classification Algorithms to Automatically Identify Chest X-Ray Reports That Support Pneumonia
Journal of Biomedical Informatics, 2001
A Framework for Comprehensive Health Terminology Systems in the United States: Development Guidelines, Criteria for Selection, and Public Policy Implications
Journal of the American Medical Informatics Association, 1998
The use of the area under the ROC curve in the evaluation of machine learning algorithms
Pattern Recognition, 1997
Evaluation of automatic knowledge acquisition techniques in the diagnosis of acute abdominal pain
Artificial Intelligence in Medicine, 1996
Importance of events per independent variable in proportional hazards regression analysis II. Accuracy and precision of regression estimates
Journal of Clinical Epidemiology, 1995
Unlocking Clinical Data from Narrative Reports: A Study of Natural Language Processing
Annals of Internal Medicine, 1995
A General Natural-language Text Processor for Clinical Radiology
Journal of the American Medical Informatics Association, 1994
Monitoring Free-Text Data Using Medical Language Processing
Computers and Biomedical Research, 1993
Artificial intelligence: concepts and applications in engineering
Choice Reviews Online, 1991
Nonparametric indexes for sensitivity and bias: Computing formulas.
Psychological Bulletin, 1971