Mining SNPs from DNA Sequence Data; Computational Approaches to SNP Discovery and Analysis
- 5 August 2009
- book chapter
- Published by Springer Nature
- Vol. 578, 73-91
- https://doi.org/10.1007/978-1-60327-411-1_4
Abstract
Single nucleotide polymorphisms (SNPs) are the most abundant form of genetic variation and are the basis for most molecular markers. Before these SNPs can be used for direct sequence-based SNP detection or in a derived SNP assay, they need to be identified. For those regions or species where no validated SNPs are available in the public databases, a good alternative is to mine them from DNA sequences. The alignment of multiple sequence fragments originating from different genotypes representing the same region on the genome will allow for the discovery of sequence variants. The corresponding nucleotide mismatches are likely to be SNPs or insertions/deletions. A large amount of sequence data to be mined is present in the public databases (both expressed sequence tags and genomic sequences) and is free to use without having to do large-scale sequencing oneself. However, with the appearance of the next-generation sequencing machines (Roche GS/454, Illumina GA/Solexa, SOLiD), high-throughput sequencing is becoming widely available. This will allow for the sequencing of polymorphic genotypes on specific target areas and consequent SNP identification. In this paper we discuss the bioinformatics tools required to analyze DNA sequence data for SNP mining. A general approach for the consecutive steps in the mining process is described and commonly used SNP discovery pipelines are presented.Keywords
This publication has 41 references indexed in Scilit:
- HaploSNPer: a web-based allele and SNP detection toolBMC Genomic Data, 2008
- SOAP: short oligonucleotide alignment programBioinformatics, 2008
- Complexity Reduction of Polymorphic Sequences (CRoPS™): A Novel Approach for Large-Scale Polymorphism Discovery in Complex GenomesPLOS ONE, 2007
- SNP discovery via 454 transcriptome sequencingThe Plant Journal, 2007
- SNP mining porcine ESTs with MAVIANT, a novel tool for SNP evaluation and annotationBioinformatics, 2007
- SNPdetector: A Software Tool for Sensitive and Accurate SNP DetectionPLoS Computational Biology, 2005
- BLAT—The BLAST-Like Alignment ToolGenome Research, 2002
- A map of human genome sequence variation containing 1.42 million single nucleotide polymorphismsNature, 2001
- AFLP: a new technique for DNA fingerprintingNucleic Acids Research, 1995
- Basic local alignment search toolJournal of Molecular Biology, 1990