MINIMUM REDUNDANCY FEATURE SELECTION FROM MICROARRAY GENE EXPRESSION DATA
Top Cited Papers
- 1 April 2005
- journal article
- research article
- Published by World Scientific Pub Co Pte Ltd in Journal of Bioinformatics and Computational Biology
- Vol. 03 (02) , 185-205
- https://doi.org/10.1142/s0219720005001004
Abstract
How to selecting a small subset out of the thousands of genes in microarray data is important for accurate classification of phenotypes. Widely used methods typically rank genes according to their differential expressions among phenotypes and pick the top-ranked genes. We observe that feature sets so obtained have certain redundancy and study methods to minimize it. We propose a minimum redundancy — maximum relevance (MRMR) feature selection framework. Genes selected via MRMR provide a more balanced coverage of the space and capture broader characteristics of phenotypes. They lead to significantly improved class predictions in extensive experiments on 6 gene expression data sets: NCI, Lymphoma, Lung, Child Leukemia, Leukemia, and Colon. Improvements are observed consistently among 4 classification methods: Naïve Bayes, Linear discriminant analysis, Logistic regression, and Support vector machines. Supplimentary: The top 60 MRMR genes for each of the datasets are listed in . More information related to MRMR methods can be found at .Keywords
This publication has 22 references indexed in Scilit:
- Classification of multiple cancer types by multicategory support vector machines using gene expression dataBioinformatics, 2003
- A comparison of methods for multiclass support vector machinesIEEE Transactions on Neural Networks, 2002
- Diversity of gene expression in adenocarcinoma of the lungProceedings of the National Academy of Sciences, 2001
- Multi-class protein fold recognition using support vector machines and neural networksBioinformatics, 2001
- Support vector machine classification and validation of cancer tissue samples using microarray expression dataBioinformatics, 2000
- Tissue Classification with Gene Expression ProfilesJournal of Computational Biology, 2000
- Distinct types of diffuse large B-cell lymphoma identified by gene expression profilingNature, 2000
- Molecular Classification of Cancer: Class Discovery and Class Prediction by Gene Expression MonitoringScience, 1999
- Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arraysProceedings of the National Academy of Sciences, 1999
- Wrappers for feature subset selectionArtificial Intelligence, 1997