A general approach to single-nucleotide polymorphism discovery
- 1 December 1999
- journal article
- letter
- Published by Springer Nature in Nature Genetics
- Vol. 23 (4) , 452-456
- https://doi.org/10.1038/70570
Abstract
Single-nucleotide polymorphisms (SNPs) are the most abundant form of human genetic variation and a resource for mapping complex genetic traits1. The large volume of data produced by high-throughput sequencing projects is a rich and largely untapped source of SNPs (refs 2, 3, 4, 5). We present here a unified approach to the discovery of variations in genetic sequence data of arbitrary DNA sources. We propose to use the rapidly emerging genomic sequence6,7 as a template on which to layer often unmapped, fragmentary sequence data8,9,10,11 and to use base quality values12 to discern true allelic variations from sequencing errors. By taking advantage of the genomic sequence we are able to use simpler yet more accurate methods for sequence organization: fragment clustering, paralogue identification and multiple alignment. We analyse these sequences with a novel, Bayesian inference engine, POLYBAYES, to calculate the probability that a given site is polymorphic. Rigorous treatment of base quality permits completely automated evaluation of the full length of all sequences, without limitations on alignment depth. We demonstrate this approach by accurate SNP predictions in human ESTs aligned to finished and working-draft quality genomic sequences, a data set representative of the typical challenges of sequence-based SNP discovery.Keywords
This publication has 23 references indexed in Scilit:
- New Goals for the U.S. Human Genome Project: 1998-2003Science, 1998
- Shotgun Sequencing of the Human GenomeScience, 1998
- Large-Scale Identification, Mapping, and Genotyping of Single-Nucleotide Polymorphisms in the Human GenomeScience, 1998
- The Homozygous Complete Hydatidiform Mole: A Unique Resource for Genome StudiesGenomics, 1997
- Variations on a Theme: Cataloging Human DNA Sequence VariationScience, 1997
- Toward the development of a gene index to the human genome: an assessment of the nature of high-throughput EST sequence data.Genome Research, 1996
- Generation and analysis of 280,000 human expressed sequence tags.Genome Research, 1996
- Comparative Analysis of Human DNA Variations by Fluorescence-Based Sequencing of PCR ProductsGenomics, 1994
- Rapid cDNA sequencing (expressed sequence tags) from a directionally cloned human infant brain cDNA libraryNature Genetics, 1993
- LII. An essay towards solving a problem in the doctrine of chances. By the late Rev. Mr. Bayes, F. R. S. communicated by Mr. Price, in a letter to John Canton, A. M. F. R. SPhilosophical Transactions of the Royal Society of London, 1763