Pathway-based identification of SNPs predictive of survival
- 2 February 2011
- journal article
- research article
- Published by Springer Nature in European Journal of Human Genetics
- Vol. 19 (6) , 704-709
- https://doi.org/10.1038/ejhg.2011.3
Abstract
In recent years, several association analysis methods for case-control studies have been developed. However, as we turn towards the identification of single nucleotide polymorphisms (SNPs) for prognosis, there is a need to develop methods for the identification of SNPs in high dimensional data with survival outcomes. Traditional methods for the identification of SNPs have some drawbacks. First, the majority of the approaches for case-control studies are based on single SNPs. Second, SNPs that are identified without incorporating biological knowledge are more difficult to interpret. Random forests has been found to perform well in gene expression analysis with survival outcomes. In this paper we present the first pathway-based method to correlate SNP with survival outcomes using a machine learning algorithm. We illustrate the application of pathway-based analysis of SNPs predictive of survival with a data set of 192 multiple myeloma patients genotyped for 500 000 SNPs. We also present simulation studies that show that the random forests technique with log-rank score split criterion outperforms several other machine learning algorithms. Thus, pathway-based survival analysis using machine learning tools represents a promising approach for the identification of biologically meaningful SNPs associated with disease. © 2011 Macmillan Publishers Limited All rights reserved.link_to_subscribed_fulltexKeywords
This publication has 48 references indexed in Scilit:
- Gene and pathway-based second-wave analysis of genome-wide association studiesEuropean Journal of Human Genetics, 2009
- Diverse Genome-wide Association Studies Associate the IL12/IL23 Pathway with Crohn DiseaseAmerican Journal of Human Genetics, 2009
- Using genome‐wide pathway analysis to unravel the etiology of complex diseasesGenetic Epidemiology, 2009
- A Unified Approach to Genotype Imputation and Haplotype-Phase Inference for Large Data Sets of Trios and Unrelated IndividualsAmerican Journal of Human Genetics, 2009
- Caspase polymorphisms and genetic susceptibility to multiple myelomaHematological Oncology, 2008
- Pathway-Based Approaches for Analysis of Genomewide Association StudiesAmerican Journal of Human Genetics, 2007
- A second generation human haplotype map of over 3.1 million SNPsNature, 2007
- Generating survival times to simulate Cox proportional hazards modelsStatistics in Medicine, 2005
- Identifying SNPs predictive of phenotype using random forestsGenetic Epidemiology, 2004
- Statistical significance for genomewide studiesProceedings of the National Academy of Sciences, 2003