Exact computation of pattern probabilities in random sequences generated by Markov chains

31 December 1989

journal article
research article
Published by Oxford University Press (OUP) in Bioinformatics

Vol. 6 (4) , 347-353
https://doi.org/10.1093/bioinformatics/6.4.347

Abstract

Observed patterns in macromolecular sequences are often considered as words and compard with their probabilities of occurring in random sequences. Calculation of these probabilities, however, often lacks rigour. We have developed an algorithm for exact computation of such probabilites for stochastic sequences that follow a Markov chain model. The method is applicable to the case that a random sequence contains one out of two given patterns P and Q, or both simultaneously. Another application yields the probability function P (x) that a sequence contains pattern P exactly x times. An application to patterns that include wild-card characters yields probabilities for homonucleotide clusters of a given length. We prove the probability of multiple runs of single nucleotides in the SV40 genome to be in accordance with the dinucleotide composition of the sequence, although it is in conflict with mononucleotide composition.

Keywords

This publication has 2 references indexed in Scilit:

Distinguished words in data sequences: Analysis and applications to neural coding and other fields
Bulletin of Mathematical Biology, 1984
A Markov analysis of DNA sequences
Journal of Theoretical Biology, 1983