Classification of Cancer Stage from Free-text Histology Reports
- 1 August 2006
- conference paper
- Published by Institute of Electrical and Electronics Engineers (IEEE)
- Vol. 2006 (1557170X) , 5153-5156
- https://doi.org/10.1109/iembs.2006.259563
Abstract
This article investigates the classification of a patient's lung cancer stage based on analysis of their free-text medical reports. The system uses natural language processing to transform the report text, including identification of UMLS terms and detection of negated findings. The transformed report is then classified using statistical machine learning techniques. A support vector machine is trained for each stage category based on word occurrences in a corpus of histology reports for pathologically staged patients. New reports can be classified according to the most likely stage, allowing the collection of population stage data for analysis of outcomes. While the system could in principle be applied to stage different cancer types, the current work focuses on lung cancer due to data availability. The article presents initial experiments quantifying system performance for T and N staging on a corpus of histology reports from more than 700 lung cancer patientsKeywords
This publication has 13 references indexed in Scilit:
- MediClass: A System for Detecting and Classifying Encounter-based Clinical Events in Any Electronic Medical RecordJournal of the American Medical Informatics Association, 2005
- Classifying free-text triage chief complaints into syndromic categories with natural language processingArtificial Intelligence in Medicine, 2005
- Text Categorization Models for High-Quality Article Retrieval in Internal MedicineJournal of the American Medical Informatics Association, 2004
- Automated prognostic tool for cervical cancer patient databasePublished by Institute of Electrical and Electronics Engineers (IEEE) ,2004
- Fever detection from free-text clinical records for biosurveillanceJournal of Biomedical Informatics, 2004
- Creating a text classifier to detect radiology reports describing mediastinal findings associated with inhalational anthrax and other disordersJournal of the American Medical Informatics Association, 2003
- The Role of Domain Knowledge in Automating Medical Text Report ClassificationJournal of the American Medical Informatics Association, 2003
- A Simple Algorithm for Identifying Negated Findings and Diseases in Discharge SummariesJournal of Biomedical Informatics, 2001
- Staging of cervical cancer with soft computingIEEE Transactions on Biomedical Engineering, 2000
- An Experiment Comparing Lexical and Statistical Methods for Extracting MeSH Terms from Clinical Free TextJournal of the American Medical Informatics Association, 1998