Gene selection for microarray data analysis using principal component analysis
- 1 April 2005
- journal article
- research article
- Published by Wiley in Statistics in Medicine
- Vol. 24 (13) , 2069-2087
- https://doi.org/10.1002/sim.2082
Abstract
Principal component analysis (PCA) has been widely used in multivariate data analysis to reduce the dimensionality of the data in order to simplify subsequent analysis and allow for summarization of the data in a parsimonious manner. It has become a useful tool in microarray data analysis. For a typical microarray data set, it is often difficult to compare the overall gene expression difference between observations from different groups or conduct the classification based on a very large number of genes. In this paper, we propose a gene selection method based on the strategy proposed by Krzanowski. We demonstrate the effectiveness of this procedure using a cancer gene expression data set and compare it with several other gene selection strategies. It turns out that the proposed method selects the best gene subset for preserving the original data structure. Copyright © 2005 John Wiley & Sons, Ltd.Keywords
This publication has 12 references indexed in Scilit:
- A statistical perspective on gene expression data analysisStatistics in Medicine, 2003
- Block principal component analysis with application to gene microarray data classificationStatistics in Medicine, 2002
- Comparison of Discrimination Methods for the Classification of Tumors Using Gene Expression DataJournal of the American Statistical Association, 2002
- Variable selection and the interpretation of principal subspacesJournal of Agricultural, Biological and Environmental Statistics, 2001
- A gene expression database for the molecular pharmacology of cancerNature Genetics, 2000
- A stopping rule for structure-preserving variable selectionStatistics and Computing, 1996
- Cross-Validation in Principal Component AnalysisBiometrics, 1987
- Selection of Variables to Preserve Multivariate Data Structure, Using Principal ComponentsJournal of the Royal Statistical Society Series C: Applied Statistics, 1987
- Discarding Variables in a Principal Component Analysis. II: Real DataJournal of the Royal Statistical Society Series C: Applied Statistics, 1973
- Discarding Variables in a Principal Component Analysis. I: Artificial DataJournal of the Royal Statistical Society Series C: Applied Statistics, 1972