String matching bounds via coding
Open Access
- 1 January 1997
- journal article
- Published by Institute of Mathematical Statistics in The Annals of Probability
- Vol. 25 (1) , 329-336
- https://doi.org/10.1214/aop/1024404290
Abstract
It is known that the length $L(x_1^n)$ of the longest block appearing at least twice in a randomly chosen sample path of length $n$ drawn from an i.i.d. process is asymptotically almost surely equal to $C \log n$, where the constant $C$ depends on the process. A simple coding argument will be used to show that for a class of processes called the finite energy processes, $L(x_1^n)$ is almost surely upper bounded by $C \log n$, where $C$ is a constant. While the coding technique does not yield the exact constant $C$, it does show clearly what is needed to obtain log $n$ bounds.
Keywords
This publication has 7 references indexed in Scilit:
- The Ergodic Theory of Discrete Sample PathsPublished by American Mathematical Society (AMS) ,1996
- Entropy and data compression schemesIEEE Transactions on Information Theory, 1993
- String Matching: The Ergodic CaseThe Annals of Probability, 1992
- Entropy and PrefixesThe Annals of Probability, 1992
- How Sampling Reveals a ProcessThe Annals of Probability, 1990
- The Erdos-Renyi Strong Law for Pattern Matching with a Given Proportion of MismatchesThe Annals of Probability, 1989
- Estimating the information content of symbol sequences and efficient codesIEEE Transactions on Information Theory, 1989