Redundancy based feature selection for microarray data
- 22 August 2004
- proceedings article
- Published by Association for Computing Machinery (ACM)
- p. 737-742
- https://doi.org/10.1145/1014052.1014149
Abstract
In gene expression microarray data analysis, selecting a small number of discriminative genes from thousands of genes is an important problem for accurate classification of diseases or phenotypes. The problem becomes particularly challenging due to the large number of features (genes) and small sample size. Traditional gene selection methods often select the top-ranked genes according to their individual discriminative power without handling the high degree of redundancy among the genes. Latest research shows that removing redundant genes among selected ones can achieve a better representation of the characteristics of the targeted phenotypes and lead to improved classification accuracy. Hence, we study in this paper the relationship between feature relevance and redundancy and propose an efficient method that can effectively remove redundant genes. The efficiency and effectiveness of our method in comparison with representative methods has been demonstrated through an empirical study using public microarray data sets.Keywords
This publication has 16 references indexed in Scilit:
- Interactive exploration of coherent patterns in time-series gene expression dataPublished by Association for Computing Machinery (ACM) ,2003
- Discretization: An Enabling TechniqueData Mining and Knowledge Discovery, 2002
- Biomarker Identification by Feature WrappersGenome Research, 2001
- Feature selection for DNA methylation based cancer classificationBioinformatics, 2001
- Small sample issues for microarray‐based classificationComparative and Functional Genomics, 2001
- Distinct types of diffuse large B-cell lymphoma identified by gene expression profilingNature, 2000
- Wrappers for feature subset selectionArtificial Intelligence, 1997
- Selection of relevant features and examples in machine learningArtificial Intelligence, 1997
- Quantitative Monitoring of Gene Expression Patterns with a Complementary DNA MicroarrayScience, 1995
- Irrelevant Features and the Subset Selection ProblemPublished by Elsevier ,1994