An effective general purpose approach for automated biomedical document classification.
- 1 January 2006
- journal article
- Vol. 2006, 161-5
Abstract
Automated document classification can be a valuable tool for biomedical tasks that involve large amounts of text. However, in biomedicine, documents that have the desired properties are often rare, and special methods are usually required to address this issue. We propose and evaluate a method of classifying biomedical text documents, optimizing for utility when misclassification costs are highly asymmetric between the positive and negative classes. The method uses chi-square feature selection and several iterations of cost proportionate rejection sampling followed by application of a support vector machine (SVM), combining the resulting classifier results with voting. It is straightforward, fast, and achieves competitive performance on a set of standardized biomedical text classification evaluation tasks. The method is a good general purpose approach for classifying biomedical text.This publication has 3 references indexed in Scilit:
- Reducing Workload in Systematic Review Preparation Using Automated Citation ClassificationJournal of the American Medical Informatics Association, 2006
- Text Categorization Models for High-Quality Article Retrieval in Internal MedicineJournal of the American Medical Informatics Association, 2004
- Protein names precisely peeled off free textBioinformatics, 2004