Multi-class Classification of Cancer Stages from Free-text Histology Reports using Support Vector Machines

Abstract
Multi-class machine learning techniques using support vector machines (SVM) are proposed to classify the TNM stage of lung cancer patients from analysis of their free- text histology reports. Stages obtained automatically can be used for retrospective population-level studies of lung cancer outcomes. While the system could in principle be applied to stage different cancer types, the paper focuses on staging lung cancer due to data availability. Experiments have quantified system performance on a corpus of reports from 710 lung cancer patients using four different SVM architectures for multi-class classification. Results show that a system based on standard binary SVM classifiers organised in a hierarchical architecture show the most promise with overall accuracy results of 0.64 and 0.82 across T and N stages, respectively.

This publication has 4 references indexed in Scilit: