Counts of long aligned word matches among random letter sequences

Abstract
Asymptotic distributional properties of the maximal length aligned word (a contiguous set of letters) among multiple random Markov dependent sequences composed of letters from a finite alphabet are given. For sequences of length N, Cr,s(N) defined as the longest common aligned word found in r or more of s sequences has order growth log N/(–logλ) where λis the maximal eigenvalue of r-Schur product matrices from among the collections of Markov matrices that generate the sequences. The count Zr,s(N, k) of positions that initiate an aligned match of length exceeding k = log N/(–logλ) + x but fail to match at the immediately preceding position has a limiting Poisson distribution. Distributional properties of other long aligned word relationships and patterns are also discussed.

This publication has 15 references indexed in Scilit: