An Overview on the Distribution of Word Counts in Markov Chains
- 1 February 2000
- journal article
- research article
- Published by Mary Ann Liebert Inc in Journal of Computational Biology
- Vol. 7 (1-2) , 193-201
- https://doi.org/10.1089/10665270050081469
Abstract
In this paper, we give an overview about the different results existing on the statistical distribution of word counts in a Markovian sequence of letters. Results concerning the number of overlapping occurrences, the number of renewals and the number of clumps will be presented. Counts of single words and also multiple words are considered. Most of the results are approximations as the length of the sequence tends to infinity. We will see that Gaussian approximations switch to (compound) Poisson approximations for rare words. Modeling DNA sequences or proteins by stationary Markov chains, these results can be used to study the statistical frequency of motifs in a given sequence.Keywords
This publication has 43 references indexed in Scilit:
- Poisson Approximation and the Chen-Stein MethodStatistical Science, 1990
- The Erdos-Renyi Strong Law for Pattern Matching with a Given Proportion of MismatchesThe Annals of Probability, 1989
- Two Moments Suffice for Poisson Approximations: The Chen-Stein MethodThe Annals of Probability, 1989
- On some statistics connected with runs in Markov chainsJournal of Applied Probability, 1988
- The analysis of intron data and their use in the detection of short signalsJournal of Molecular Evolution, 1987
- Markov renewal processes, counters and repeated sequences in Markov chainsAdvances in Applied Probability, 1987
- Critical Phenomena in Sequence MatchingThe Annals of Probability, 1985
- The occurrence of sequence patterns in ergodic Markov chainsStochastic Processes and their Applications, 1984
- How many random digits are required until given sequences are obtained?Journal of Applied Probability, 1982
- On the mean number of random digits until a given sequence occursJournal of Applied Probability, 1982