New gene selection method for classification of cancer subtypes considering within‐class variation

Abstract
In this work we propose a new method for finding gene subsets of microarray data that effectively discriminates subtypes of disease. We developed a new criterion for measuring the relevance of individual genes by using mean and standard deviation of distances from each sample to the class centroid in order to treat the well‐known problem of gene selection, large within‐class variation. Also this approach has the advantage that it is applicable not only to binary classification but also to multiple classification problems. We demonstrated the performance of the method by applying it to the publicly available microarray datasets, leukemia (two classes) and small round blue cell tumors (four classes). The proposed method provides a very small number of genes compared with the previous methods without loss of discriminating power and thus it can effectively facilitate further biological and clinical researches.