Biomarker Identification by Feature Wrappers
- 1 November 2001
- journal article
- Published by Cold Spring Harbor Laboratory in Genome Research
- Vol. 11 (11) , 1878-1887
- https://doi.org/10.1101/gr.190001
Abstract
Gene expression studies bridge the gap between DNA information and trait information by dissecting biochemical pathways into intermediate components between genotype and phenotype. These studies open new avenues for identifying complex disease genes and biomarkers for disease diagnosis and for assessing drug efficacy and toxicity. However, the majority of analytical methods applied to gene expression data are not efficient for biomarker identification and disease diagnosis. In this paper, we propose a general framework to incorporate feature (gene) selection into pattern recognition in the process to identify biomarkers. Using this framework, we develop three feature wrappers that search through the space of feature subsets using the classification error as measure of goodness for a particular feature subset being “wrapped around”: linear discriminant analysis, logistic regression, and support vector machines. To effectively carry out this computationally intensive search process, we employ sequential forward search and sequential forward floating search algorithms. To evaluate the performance of feature selection for biomarker identification we have applied the proposed methods to three data sets. The preliminary results demonstrate that very high classification accuracy can be attained by identified composite classifiers with several biomarkers.Keywords
This publication has 38 references indexed in Scilit:
- Coupled two-way clustering analysis of gene microarray dataProceedings of the National Academy of Sciences, 2000
- Using DNA Microarrays to Study Host-Microbe InteractionsEmerging Infectious Diseases, 2000
- Applying biomarker research.Environmental Health Perspectives, 2000
- Gene expression data analysisFEBS Letters, 2000
- Construction of a High-Resolution Physical Map of the Chromosome 10q22–q23 Dilated Cardiomyopathy Locus and Analysis of Candidate GenesGenomics, 2000
- Molecular Cloning and Mapping of the Brain-Abundant B1γ Subunit of Protein Phosphatase 2A, PPP2R2C, to Human Chromosome 4p16Genomics, 2000
- High density synthetic oligonucleotide arraysNature Genetics, 1999
- Exploring the new world of the genome with DNA microarraysNature Genetics, 1999
- Novel methods for subset selection with respect to problem knowledgeIEEE Intelligent Systems and their Applications, 1998
- Floating search methods in feature selectionPattern Recognition Letters, 1994