Using compound codes for automatic classification of clinical diagnoses.
- 1 January 2004
- journal article
- Vol. 107, 411-5
Abstract
Classification of diagnoses (a.k.a. coding) is the central part of current concept based medical IR systems. Some classification systems contain over 30,000 distinct codes which makes classifying clinical documents a time consuming labor intensive and error prone process. This paper presents a simple methodology for cleaning up and reusing existing manually coded diagnostic statements mainly extracted from clinical notes to build predictive models using a sparse-feature implementation of a Naïve Bayes classifier. One of the problems addressed is that diagnostic statements often contain several diagnoses and are assigned several codes resulting in a multi-class classification problem. We investigate one possible way of addressing this problem by introducing compound (multiple code) categories. We present experimental results of classifying >16,000 randomly selected diagnostic strings into 19 top level categories. A small improvement (3%) with using compound categories over simple categories indicates that using multiple code categories is a promising solution, although clearly in need of further research and refinement.Keywords
This publication has 0 references indexed in Scilit: