is-rSNP: a novel technique for in silico regulatory SNP detection
Open Access
- 4 September 2010
- journal article
- research article
- Published by Oxford University Press (OUP) in Bioinformatics
- Vol. 26 (18) , i524-i530
- https://doi.org/10.1093/bioinformatics/btq378
Abstract
Motivation: Determining the functional impact of non-coding disease-associated single nucleotide polymorphisms (SNPs) identified by genome-wide association studies (GWAS) is challenging. Many of these SNPs are likely to be regulatory SNPs (rSNPs): variations which affect the ability of a transcription factor (TF) to bind to DNA. However, experimental procedures for identifying rSNPs are expensive and labour intensive. Therefore, in silico methods are required for rSNP prediction. By scoring two alleles with a TF position weight matrix (PWM), it can be determined which SNPs are likely rSNPs. However, predictions in this manner are noisy and no method exists that determines the statistical significance of a nucleotide variation on a PWM score. Results: We have designed an algorithm for in silico rSNP detection called is-rSNP. We employ novel convolution methods to determine the complete distributions of PWM scores and ratios between allele scores, facilitating assignment of statistical significance to rSNP effects. We have tested our method on 41 experimentally verified rSNPs, correctly predicting the disrupted TF in 28 cases. We also analysed 146 disease-associated SNPs with no known functional impact in an attempt to identify candidate rSNPs. Of the 11 significantly predicted disrupted TFs, 9 had previous evidence of being associated with the disease in the literature. These results demonstrate that is-rSNP is suitable for high-throughput screening of SNPs for potential regulatory function. This is a useful and important tool in the interpretation of GWAS. Availability: is-rSNP software is available for use at: www.genomics.csse.unimelb.edu.au/is-rSNP Contact:gmaci@csse.unimelb.edu.au; adam.kowalczyk@nicta.com.au Supplementary information: Supplementary data are available at Bioinformatics online.Keywords
This publication has 28 references indexed in Scilit:
- Analysis of the IGF2/H19 imprinting control region uncovers new genetic defects, including mutations of OCT-binding sequences, in patients with 11p15 fetal growth disordersHuman Molecular Genetics, 2009
- An oestrogen-receptor-α-bound human chromatin interactomeNature, 2009
- The role of the ETS factor erg in zebrafish vasculogenesisMechanisms of Development, 2008
- In Silico Detection of Sequence Variations Modifying Transcriptional RegulationPLoS Computational Biology, 2008
- MH2 domain of Smad3 reduces HIV-1 Tat-induction of cytokine secretionJournal of Neuroimmunology, 2006
- WebLogo: A Sequence Logo Generator: Figure 1Genome Research, 2004
- Polymorphisms in the 5′-untranslated region of the human serotonin receptor 1B (HTR1B) gene affect gene expressionMolecular Psychiatry, 2003
- Differential binding of transcription factor E2F-2 to the endothelin-converting enzyme-1b promoter affects blood pressure regulationHuman Molecular Genetics, 2003
- The statistical significance of nucleotide position-weight matrix matchesBioinformatics, 1996
- Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple TestingJournal of the Royal Statistical Society Series B: Statistical Methodology, 1995