Sequence Determination from Overlapping Fragments: A Simple Model of Whole-Genome Shotgun Sequencing
- 28 January 2002
- journal article
- research article
- Published by American Physical Society (APS) in Physical Review Letters
- Vol. 88 (6) , 068106
- https://doi.org/10.1103/physrevlett.88.068106
Abstract
Assembling fragments randomly sampled from along a sequence is the basis of whole-genome shotgun sequencing, a technique used to map the DNA of the human and other genomes. We calculate the probability that a random sequence can be recovered from a collection of overlapping fragments. We provide an exact solution for an infinite alphabet and in the case of constant overlaps. For the general problem we apply two assembly strategies and give the probability that the assembly puzzle can be solved in the limit of infinitely many fragments.Keywords
This publication has 16 references indexed in Scilit:
- The Sequence of the Human GenomeScience, 2001
- Initial sequencing and analysis of the human genomeNature, 2001
- Finding Borders between Coding and Noncoding DNA Regions by an Entropic Segmentation MethodPhysical Review Letters, 2000
- Shotgun Sequencing of the Human GenomeScience, 1998
- Meanders and the Temperley-Lieb algebraCommunications in Mathematical Physics, 1997
- Similarity Detection and LocalizationPhysical Review Letters, 1996
- Characterizing Long-Range Correlations in DNA Sequences from Wavelet AnalysisPhysical Review Letters, 1995
- Sequence alignment and penalty choiceJournal of Molecular Biology, 1994
- Genomic mapping by fingerprinting random clones: A mathematical analysisGenomics, 1988
- On the statistical mechanics of optimization problems of the travelling salesman typeJournal de Physique Lettres, 1984