Poisson Process Approximation for Sequence Repeats, and Sequencing by Hybridization
- 1 January 1996
- journal article
- research article
- Published by Mary Ann Liebert Inc in Journal of Computational Biology
- Vol. 3 (3) , 425-463
- https://doi.org/10.1089/cmb.1996.3.425
Abstract
Sequencing by hybridization is a tool to determine a DNA sequence from the unordered list of all l-tuples contained in this sequence; typical numbers for l are l = 8, 10, 12. For theoretical purposes we assume that the multiset of all l-tuples is known. This multiset determines the DNA sequence uniquely if none of the so-called Ukkonen transformations are possible. These transformations require repeats of (l – 1)-tuples in the sequence, with these repeats occurring in certain spatial patterns. We model DNA as an i.i.d. sequence. We first prove Poisson process approximations for the process of indicators of all leftmost long repeats allowing self-overlap and for the process of indicators of all left-most long repeats without self-overlap. Using the Chen-Stein method, we get bounds on the error of these approximations. As a corollary, we approximate the distribution of longest repeats. In the second step we analyze the spatial patterns of the repeats. Finally we combine these two steps to prove an approximation for the probability that a random sequence is uniquely recoverable from its list of l-tuples. For all our results we give some numerical examples including error bounds. Key words: sequencing by hybridization, sequence repeats, DNA sequences, Chen–Stein method, Poisson process approximation, Ukkonen transformations.Keywords
This publication has 19 references indexed in Scilit:
- DNA physical mapping and alternating Eulerian cycles in colored graphsAlgorithmica, 1995
- The Probability of Unique Solutions of Sequencing by HybridizationJournal of Computational Biology, 1994
- Approximate string-matching with q-grams and maximal matchesTheoretical Computer Science, 1992
- Light-Directed, Spatially Addressable Parallel Chemical SynthesisScience, 1991
- The Erdos-Renyi Law in Distribution, for Coin Tossing and Sequence MatchingThe Annals of Statistics, 1990
- Two Moments Suffice for Poisson Approximations: The Chen-Stein MethodThe Annals of Probability, 1989
- A novel method for nucleic acid sequence determinationJournal of Theoretical Biology, 1988
- Counts of long aligned word matches among random letter sequencesAdvances in Applied Probability, 1987
- An Extreme Value Theory for Sequence MatchingThe Annals of Statistics, 1986
- Poisson Approximation for Dependent TrialsThe Annals of Probability, 1975