Single nucleotide differences (SNDs) in the dbSNP database may lead to errors in genotyping and haplotyping studies
- 18 December 2009
- journal article
- research article
- Published by Hindawi Limited in Human Mutation
- Vol. 31 (1) , 67-73
- https://doi.org/10.1002/humu.21137
Abstract
The creation of single nucleotide polymorphism (SNP) databases (such as NCBI dbSNP) has facilitated scientific research in many fields. SNP discovery and detection has improved to the extent that there are over 17 million human reference (rs) SNPs reported to date (Build 129 of dbSNP). SNP databases are unfortunately not always complete and/or accurate. In fact, half of the reported SNPs are still only candidate SNPs and are not validated in a population. We describe the identification of SNDs (single nucleotide differences) in humans, that may contaminate the dbSNP database. These SNDs, reported as real SNPs in the database, do not exist as such, but are merely artifacts due to the presence of a paralogue (highly similar duplicated) sequence in the genome. Using sequencing we showed how SNDs could originate in two paralogous genes and evaluated samples from a population of 100 individuals for the presence/absence of SNPs. Moreover, using bioinformatics, we predicted as many as 8.32% of the biallelic, coding SNPs in the dbSNP database to be SNDs. Our identification of SNDs in the database will allow researchers to not only select truly informative SNPs for association studies, but also aid in determining accurate SNP genotypes and haplotypes. Hum Mutat 31:67–73, 2010.Keywords
This publication has 18 references indexed in Scilit:
- Potential etiologic and functional implications of genome-wide association loci for human diseases and traitsProceedings of the National Academy of Sciences, 2009
- McKusick's Online Mendelian Inheritance in Man (OMIM(R))Nucleic Acids Research, 2009
- The road to genome-wide association studiesNature Reviews Genetics, 2008
- Almost all human genes resulted from ancient duplicationProceedings of the National Academy of Sciences, 2006
- Efficiency and power in genetic association studiesNature Genetics, 2005
- Repbase Update, a database of eukaryotic repetitive elementsCytogenetic and Genome Research, 2005
- Duplicating SNPsNature Genetics, 2004
- Complex SNP-related sequence variation in segmental genome duplicationsNature Genetics, 2004
- Current limitations of SNP data from the public domain for studies of complex disorders: a test for ten candidate genes for obesity and osteoporosisBMC Genomic Data, 2004
- Gapped BLAST and PSI-BLAST: a new generation of protein database search programsNucleic Acids Research, 1997