DEVELOPING OPTIMAL PREDICTION MODELS FOR CANCER CLASSIFICATION USING GENE EXPRESSION DATA
- 1 January 2004
- journal article
- research article
- Published by World Scientific Pub Co Pte Ltd in Journal of Bioinformatics and Computational Biology
- Vol. 01 (04) , 681-694
- https://doi.org/10.1142/s0219720004000351
Abstract
Microarrays can provide genome-wide expression patterns for various cancers, especially for tumor sub-types that may exhibit substantially different patient prognosis. Using such gene expression data, several approaches have been proposed to classify tumor sub-types accurately. These classification methods are not robust, and often dependent on a particular training sample for modelling, which raises issues in utilizing these methods to administer proper treatment for a future patient. We propose to construct an optimal, robust prediction model for classifying cancer sub-types using gene expression data. Our model is constructed in a step-wise fashion implementing cross-validated quadratic discriminant analysis. At each step, all identified models are validated by an independent sample of patients to develop a robust model for future data. We apply the proposed methods to two microarray data sets of cancer: the acute leukemia data by Golub et al.3and the colon cancer data by Alon et al.12We have found that the dimensionality of our optimal prediction models is relatively small for these cases and that our prediction models with one or two gene factors outperforms or has competing performance, especially for independent samples, to other methods based on 50 or more predictive gene factors. The methodology is implemented and developed by the procedures in R and Splus. The source code can be obtained at .Keywords
This publication has 14 references indexed in Scilit:
- Diagnosis of multiple cancer types by shrunken centroids of gene expressionProceedings of the National Academy of Sciences, 2002
- Tumor classification by partial least squares using microarray gene expression dataBioinformatics, 2002
- Gene selection for sample classification based on gene expression data: study of sensitivity to choice of parameters of the GA/KNN methodBioinformatics, 2001
- Predicting the clinical status of human breast cancer by using gene expression profilesProceedings of the National Academy of Sciences, 2001
- Regression Modeling StrategiesPublished by Springer Nature ,2001
- Model-based analysis of oligonucleotide arrays: Expression index computation and outlier detectionProceedings of the National Academy of Sciences, 2000
- Support vector machine classification and validation of cancer tissue samples using microarray expression dataBioinformatics, 2000
- Cancer cells, chemotherapy and gene clustersNature Genetics, 2000
- Molecular Classification of Cancer: Class Discovery and Class Prediction by Gene Expression MonitoringScience, 1999
- Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arraysProceedings of the National Academy of Sciences, 1999