Simple decision rules for classifying human cancers from gene expression profiles
Top Cited Papers
Open Access
- 16 August 2005
- journal article
- research article
- Published by Oxford University Press (OUP) in Bioinformatics
- Vol. 21 (20) , 3896-3904
- https://doi.org/10.1093/bioinformatics/bti631
Abstract
Motivation: Various studies have shown that cancer tissue samples can be successfully detected and classified by their gene expression patterns using machine learning approaches. One of the challenges in applying these techniques for classifying gene expression data is to extract accurate, readily interpretable rules providing biological insight as to how classification is performed. Current methods generate classifiers that are accurate but difficult to interpret. This is the trade-off between credibility and comprehensibility of the classifiers. Here, we introduce a new classifier in order to address these problems. It is referred to as k-TSP (k–Top Scoring Pairs) and is based on the concept of ‘relative expression reversals’. This method generates simple and accurate decision rules that only involve a small number of gene-to-gene expression comparisons, thereby facilitating follow-up studies. Results: In this study, we have compared our approach to other machine learning techniques for class prediction in 19 binary and multi-class gene expression datasets involving human cancers. The k-TSP classifier performs as efficiently as Prediction Analysis of Microarray and support vector machine, and outperforms other learning methods (decision trees, k-nearest neighbour and naïve Bayes). Our approach is easy to interpret as the classifier involves only a small number of informative genes. For these reasons, we consider the k-TSP method to be a useful tool for cancer classification from microarray gene expression data. Availability: The software and datasets are available at http://www.ccbm.jhu.edu Contact:actan@jhu.eduKeywords
This publication has 40 references indexed in Scilit:
- A comparative study of feature selection and multiclass classification methods for tissue classification based on gene expressionBioinformatics, 2004
- Classifying Gene Expression Profiles from Pairwise mRNA ComparisonsStatistical Applications in Genetics and Molecular Biology, 2004
- Gene-expression profiles predict survival of patients with lung adenocarcinomaNature Medicine, 2002
- Comparison of Discrimination Methods for the Classification of Tumors Using Gene Expression DataJournal of the American Statistical Association, 2002
- Prediction of central nervous system embryonal tumour outcome based on gene expressionNature, 2002
- MLL translocations specify a distinct gene expression profile that distinguishes a unique leukemiaNature Genetics, 2001
- p29, a Novel GCIP-Interacting Protein, Localizes in the NucleusBiochemical and Biophysical Research Communications, 2000
- Distinct types of diffuse large B-cell lymphoma identified by gene expression profilingNature, 2000
- 10.1162/153244302320884605Applied Physics Letters, 2000
- Joint induction of shape features and tree classifiersPublished by Institute of Electrical and Electronics Engineers (IEEE) ,1997