Automatic Coding of Diagnostic Reports

Abstract
A method is presented for assigning classification codes to pathology reports by searching similar reports from an archive collection. The key for searching is textual similarity, which estimates the true, semantic similarity. This method does not require explicit modeling, and can be applied to any language or any application domain that uses natural language reporting. A number of simulation experiments was run to assess the accuracy of the method and to indicate the role of size of the archive and the transfer of document collections across laboratories. In at least 63% of the simulation trials, the most similar archive text offered a suitable classification on organ, origin and diagnosis. In 85 to 90% ofthe trials, the archive's best solution was found within the first five similar reports. The results indicate that the method is suitable for its purpose: suggesting potentially correct classifications to the reporting diagnostician.

This publication has 4 references indexed in Scilit: