Bioinformatics challenges for genome-wide association studies
Top Cited Papers
Open Access
- 6 January 2010
- journal article
- review article
- Published by Oxford University Press (OUP) in Bioinformatics
- Vol. 26 (4) , 445-455
- https://doi.org/10.1093/bioinformatics/btp713
Abstract
Motivation: The sequencing of the human genome has made it possible to identify an informative set of >1 million single nucleotide polymorphisms (SNPs) across the genome that can be used to carry out genome-wide association studies (GWASs). The availability of massive amounts of GWAS data has necessitated the development of new biostatistical methods for quality control, imputation and analysis issues including multiple testing. This work has been successful and has enabled the discovery of new associations that have been replicated in multiple studies. However, it is now recognized that most SNPs discovered via GWAS have small effects on disease susceptibility and thus may not be suitable for improving health care through genetic testing. One likely explanation for the mixed results of GWAS is that the current biostatistical analysis paradigm is by design agnostic or unbiased in that it ignores all prior knowledge about disease pathobiology. Further, the linear modeling framework that is employed in GWAS often considers only one SNP at a time thus ignoring their genomic and environmental context. There is now a shift away from the biostatistical approach toward a more holistic approach that recognizes the complexity of the genotype–phenotype relationship that is characterized by significant heterogeneity and gene–gene and gene–environment interaction. We argue here that bioinformatics has an important role to play in addressing the complexity of the underlying genetic basis of common human diseases. The goal of this review is to identify and discuss those GWAS challenges that will require computational methods. Contact:jason.h.moore@dartmouth.eduKeywords
This publication has 117 references indexed in Scilit:
- Pathway analysis by adaptive combination of P‐valuesGenetic Epidemiology, 2009
- Identification of gene‐gene interactions in the presence of missing data using the multifactor dimensionality reduction methodGenetic Epidemiology, 2009
- Using genome‐wide pathway analysis to unravel the etiology of complex diseasesGenetic Epidemiology, 2009
- Network-based model weighting to detect multiple loci influencing complex diseasesHuman Genetics, 2008
- A computationally efficient hypothesis testing method for epistasis analysis using multifactor dimensionality reductionGenetic Epidemiology, 2008
- Exploiting the proteome to improve the genome-wide genetic analysis of epistasis in common human diseasesHuman Genetics, 2008
- A balanced accuracy function for epistasis modeling in imbalanced datasets using multifactor dimensionality reductionGenetic Epidemiology, 2007
- Identifying SNPs predictive of phenotype using random forestsGenetic Epidemiology, 2004
- Identifying interacting SNPs using Monte Carlo logic regressionGenetic Epidemiology, 2004
- Power of multifactor dimensionality reduction for detecting gene‐gene interactions in the presence of genotyping error, missing data, phenocopy, and genetic heterogeneityGenetic Epidemiology, 2003