An Open Access Database of Genome-wide Association Results
Open Access
- 22 January 2009
- journal article
- research article
- Published by Springer Nature in BMC Medical Genetics
- Vol. 10 (1) , 1-17
- https://doi.org/10.1186/1471-2350-10-6
Abstract
The number of genome-wide association studies (GWAS) is growing rapidly leading to the discovery and replication of many new disease loci. Combining results from multiple GWAS datasets may potentially strengthen previous conclusions and suggest new disease loci, pathways or pleiotropic genes. However, no database or centralized resource currently exists that contains anywhere near the full scope of GWAS results. We collected available results from 118 GWAS articles into a database of 56,411 significant SNP-phenotype associations and accompanying information, making this database freely available here. In doing so, we met and describe here a number of challenges to creating an open access database of GWAS results. Through preliminary analyses and characterization of available GWAS, we demonstrate the potential to gain new insights by querying a database across GWAS. Using a genomic bin-based density analysis to search for highly associated regions of the genome, positive control loci (e.g., MHC loci) were detected with high sensitivity. Likewise, an analysis of highly repeated SNPs across GWAS identified replicated loci (e.g., APOE, LPL). At the same time we identified novel, highly suggestive loci for a variety of traits that did not meet genome-wide significant thresholds in prior analyses, in some cases with strong support from the primary medical genetics literature (SLC16A7, CSMD1, OAS1), suggesting these genes merit further study. Additional adjustment for linkage disequilibrium within most regions with a high density of GWAS associations did not materially alter our findings. Having a centralized database with standardized gene annotation also allowed us to examine the representation of functional gene categories (gene ontologies) containing one or more associations among top GWAS results. Genes relating to cell adhesion functions were highly over-represented among significant associations (p < 4.6 × 10-14), a finding which was not perturbed by a sensitivity analysis. We provide access to a full gene-annotated GWAS database which could be used for further querying, analyses or integration with other genomic information. We make a number of general observations. Of reported associated SNPs, 40% lie within the boundaries of a RefSeq gene and 68% are within 60 kb of one, indicating a bias toward gene-centricity in the findings. We found considerable heterogeneity in information available from GWAS suggesting the wider community could benefit from standardization and centralization of results reporting.Keywords
This publication has 33 references indexed in Scilit:
- Resolving Individuals Contributing Trace Amounts of DNA to Highly Complex Mixtures Using High-Density SNP Genotyping MicroarraysPLoS Genetics, 2008
- Systematic biological prioritization after a genome-wide association study: an application to nicotine dependenceBioinformatics, 2008
- Complement–HIV interactions during all steps of viral pathogenesisVaccine, 2008
- Common sequence variants on 20q11.22 confer melanoma susceptibilityNature Genetics, 2008
- Meta-analysis of genome-wide association data and large-scale replication identifies additional susceptibility loci for type 2 diabetesNature Genetics, 2008
- Pathway-Based Approaches for Analysis of Genomewide Association StudiesAmerican Journal of Human Genetics, 2007
- OAS1 Splice Site Polymorphism Controlling Antiviral Enzyme Activity Influences Susceptibility to Type 1 DiabetesDiabetes, 2005
- Polymorphisms of interferon-inducible genes OAS-1 and MxA associated with SARS in the Vietnamese populationPublished by Elsevier ,2005
- Polymorphisms in interferon-induced genes and the outcome of hepatitis C virus infection: roles of MxA, OAS-1 and PKRGenes & Immunity, 2003
- Minimum information about a microarray experiment (MIAME)—toward standards for microarray dataNature Genetics, 2001