Predicting Patient Survival from Microarray Data by Accelerated Failure Time Modeling Using Partial Least Squares and LASSO
- 1 March 2007
- journal article
- Published by Oxford University Press (OUP) in Biometrics
- Vol. 63 (1) , 259-271
- https://doi.org/10.1111/j.1541-0420.2006.00660.x
Abstract
SummaryWe consider the problem of predicting survival times of cancer patients from the gene expression profiles of their tumor samples via linear regression modeling of log‐transformed failure times. The partial least squares (PLS) and least absolute shrinkage and selection operator (LASSO) methodologies are used for this purpose where we first modify the data to account for censoring. Three approaches of handling right censored data—reweighting, mean imputation, and multiple imputation—are considered. Their performances are examined in a detailed simulation study and compared with that of full data PLS and LASSO had there been no censoring. A major objective of this article is to investigate the performances of PLS and LASSO in the context of microarray data where the number of covariates is very large and there are extremely few samples. We demonstrate that LASSO outperforms PLS in terms of prediction error when the list of covariates includes a moderate to large percentage of useless or noise variables; otherwise, PLS may outperform LASSO. For a moderate sample size (100 with 10,000 covariates), LASSO performed better than a no covariate model (or noise‐based prediction). The mean imputation method appears to best track the performance of the full data PLS or LASSO. The mean imputation scheme is used on an existing data set on lung cancer. This reanalysis using the mean imputed PLS and LASSO identifies a number of genes that were known to be related to cancer or tumor activities from previous studies.Keywords
This publication has 30 references indexed in Scilit:
- Iterative Partial Least Squares with Right‐Censored Data Analysis: A Comparison to Other Dimension Reduction TechniquesBiometrics, 2005
- Partial Cox regression analysis for high-dimensional microarray gene expression dataBioinformatics, 2004
- Prediction of Survival in Diffuse Large-B-Cell Lymphoma Based on the Expression of Six GenesNew England Journal of Medicine, 2004
- Semi-Supervised Methods to Predict Patient Survival from Gene Expression DataPLoS Biology, 2004
- Discriminant models for high‐throughput proteomics mass spectrometer dataProteomics, 2003
- Gene-expression profiles predict survival of patients with lung adenocarcinomaNature Medicine, 2002
- The transcriptional map of the common eliminated region 1 (C3CER1) in 3p21.3European Journal of Human Genetics, 2002
- Iteratively Reweighted Partial Least Squares Estimation for Generalized Linear RegressionTechnometrics, 1996
- Iteratively Reweighted Partial Least Squares Estimation for Generalized Linear RegressionTechnometrics, 1996
- A Statistical View of Some Chemometrics Regression ToolsTechnometrics, 1993