Three approaches to automatic assignment of ICD-9-CM codes to radiology reports.

  • 11 October 2007
    • journal article
    • research article
    • Vol. 2007, 279-83
Abstract
We describe and evaluate three systems for automatically predicting the ICD-9-CM codes of radiology reports from short excerpts of text. The first system benefits from an open source search engine, Lucene, and takes advantage of the relevance of reports to one another based on individual words. The second uses BoosTexter, a boosting algorithm based on n-grams (sequences of consecutive words) and s-grams (sequences of non-consecutive words) extracted from the reports. The third employs a set of hand-crafted rules that capture lexical elements (short, meaningful, strings of words) derived from BoosTexter's n-grams, and that are enhanced by shallow semantic information in the form of negation, synonymy, and uncertainty. Our evaluation shows that semantic information significantly contributes to ICD-9-CM coding with lexical elements. Also, a simple hand-crafted rule-based system with lexical elements and semantic information can outperform algorithmically more complex systems, such as Lucene and BoosTexter, when these systems base their ICD-9-CM predictions only upon individual words, n-grams, or s grams.