Integrating Biological Knowledge with Gene Expression Profiles for Survival Prediction of Cancer
- 1 February 2009
- journal article
- research article
- Published by Mary Ann Liebert Inc in Journal of Computational Biology
- Vol. 16 (2) , 265-278
- https://doi.org/10.1089/cmb.2008.12tt
Abstract
Due to the large variability in survival times between cancer patients and the plethora of genes on microarrays unrelated to outcome, building accurate prediction models that are easy to interpret remains a challenge. In this paper, we propose a general strategy for improving performance and interpretability of prediction models by integrating gene expression data with prior biological knowledge. First, we link gene identifiers in expression dataset with gene annotation databases such as Gene Ontology (GO). Then we construct “supergenes” for each gene category by summarizing information from genes related to outcome using a modified principal component analysis (PCA) method. Finally, instead of using genes as predictors, we use these supergenes representing information from each gene category as predictors to predict survival outcome. In addition to identifying gene categories associated with outcome, the proposed approach also carries out additional within-category selection to select important genes within each gene set. We show, using two real breast cancer microarray datasets, that the prediction models constructed based on gene sets (or pathway) information outperform the prediction models based on expression values of single genes, with improved prediction accuracy and interpretability.Keywords
This publication has 46 references indexed in Scilit:
- Supervised principal component analysis for gene set enrichment of microarray data with continuous or survival outcomesBioinformatics, 2008
- The Humoral Immune System Has a Key Prognostic Impact in Node-Negative Breast CancerCancer Research, 2008
- Network‐based classification of breast cancer metastasisMolecular Systems Biology, 2007
- Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profilesProceedings of the National Academy of Sciences, 2005
- An expression signature for p53 status in human breast cancer predicts mutation status, transcriptional effects, and patient survivalProceedings of the National Academy of Sciences, 2005
- PGC-1α-responsive genes involved in oxidative phosphorylation are coordinately downregulated in human diabetesNature Genetics, 2003
- A molecular signature of metastasis in primary solid tumorsNature Genetics, 2002
- Gene-expression profiles predict survival of patients with lung adenocarcinomaNature Medicine, 2002
- Gene expression profiling predicts clinical outcome of breast cancerNature, 2002
- Distinct types of diffuse large B-cell lymphoma identified by gene expression profilingNature, 2000