The Application of Rule-Based Methods to Class Prediction Problems in Genomics
- 1 October 2003
- journal article
- Published by Mary Ann Liebert Inc in Journal of Computational Biology
- Vol. 10 (5) , 689-698
- https://doi.org/10.1089/106652703322539033
Abstract
We propose a method for constructing classifiers using logical combinations of elementary rules. The method is a form of rule-based classification, which has been widely discussed in the literature. In this work we focus specifically on issues that arise in the context of classifying cell samples based on RNA or protein expression measurements. The basic idea is to specify elementary rules that exhibit a locally strong pattern in favor of a single class. Strict admissibility criteria are imposed to produce a manageable universe of elementary rules. Then the elementary rules are combined using a set covering algorithm to form a composite rule that achieves a perfect fit to the training data. The user has explicit control over a parameter that determines the composite rule's level of redundancy and parsimony. This built-in control, along with the simplicity of interpreting the rules, makes the method particularly useful for classification problems in genomics. We demonstrate the new method using several microarray datasets and examine its generalization performance. We also draw comparisons to other machine-learning strategies such as CART, ID3, and C4.5.Keywords
This publication has 7 references indexed in Scilit:
- Comparison of Discrimination Methods for the Classification of Tumors Using Gene Expression DataJournal of the American Statistical Association, 2002
- Multiclass cancer diagnosis using tumor gene expression signaturesProceedings of the National Academy of Sciences, 2001
- Molecular portraits of human breast tumoursNature, 2000
- Distinct types of diffuse large B-cell lymphoma identified by gene expression profilingNature, 2000
- On the Difficulty of Designing Good ClassifiersSIAM Journal on Computing, 2000
- Molecular Classification of Cancer: Class Discovery and Class Prediction by Gene Expression MonitoringScience, 1999
- Bagging predictorsMachine Learning, 1996