Data mining of public SNP databases for the selection of intragenic SNPs
- 21 August 2002
- journal article
- database
- Published by Hindawi Limited in Human Mutation
- Vol. 20 (3) , 162-173
- https://doi.org/10.1002/humu.10107
Abstract
Different strategies to search public single nucleotide polymorphism (SNP) databases for intragenic SNPs were evaluated. First, we assembled a strategy to annotate SNPs onto candidate genes based on a BLAST search of public SNP databases (Intragenic SNP Annotation by BLAST, ISAB). Only BLAST hits that complied with stringent criteria according to 1) percentage identity (minimum 98%), 2) BLAST hit length (the hit covers at least 98% of the length of the SNP entry in the database, or the hit is longer than 250 base pairs), and 3) location in non-repetitive DNA, were considered as valid SNPs. We assessed the intragenic context and redundancy of these SNPs, and demonstrated that the SNP content of the dbSNP and HGBASE/HGVbase databases are highly complementary but also overlap significantly. Second, we assessed the validity of intragenic SNP annotation available on the dbSNP and HGVbase websites by comparison with the results of the ISAB strategy. Only a minority of all annotated SNPs was found in common between the respective public SNP database websites and the ISAB annotation strategy. A detailed analysis was performed aiming to explain this discrepancy. As a conclusion, we recommend the application of an independent strategy (such as ISAB) to annotate intragenic SNPs, complementary to the annotation provided at the dbSNP and HGVbase websites. Such an approach might be useful in the selection process of intragenic SNPs for genotyping in genetic studies. Hum Mutat 20:162–173, 2002.Keywords
This publication has 19 references indexed in Scilit:
- Polymorphisms of CYP2A6 and its practical consequencesBritish Journal of Clinical Pharmacology, 2001
- Data mining: Efficiency of using sequence databases for polymorphism discoveryHuman Mutation, 2001
- dbSNP: the NCBI database of genetic variationNucleic Acids Research, 2001
- Database Analysis and Gene Discovery in Pharmacogeneticscclm, 2000
- ALFRED: an allele frequency database for diverse populations and DNA polymorphismsNucleic Acids Research, 2000
- HGBASE: a database of SNPs and other variations in and around human genesNucleic Acids Research, 2000
- A Second Genetic Polymorphism in Methylenetetrahydrofolate Reductase (MTHFR) Associated with Decreased Enzyme ActivityMolecular Genetics and Metabolism, 1998
- A Second Common Mutation in the Methylenetetrahydrofolate Reductase Gene: An Additional Risk Factor for Neural-Tube Defects?American Journal of Human Genetics, 1998
- Gapped BLAST and PSI-BLAST: a new generation of protein database search programsNucleic Acids Research, 1997
- Genetic Dissection of Complex TraitsScience, 1994