Statistical analysis of nucleotide sequences
- 1 January 1990
- journal article
- research article
- Published by Oxford University Press (OUP) in Nucleic Acids Research
- Vol. 18 (22) , 6641-6647
- https://doi.org/10.1093/nar/18.22.6641
Abstract
In order to scan nucleic acid databases for potentially relevant but as yet unknown signals, we have developed an improved statistical model for pattern analysis of nucleic acid sequences by modifying previous methods based on Markov chains. We demonstrate the importance of selecting the appropriate parameters in order for the method to function at all. The model allows the simultaneous analysis of several short sequences with unequal base frequencies and Markov order k .noteq. 0 as is usually the case in databases. As a test of these modifications, we show that in Escherichia coli sequences there is a bias against palindromic hexamers which correspond to known restriction enzyme recognition sites.This publication has 27 references indexed in Scilit:
- [1] GenBank: Current status and future directionsPublished by Elsevier ,1990
- [15] k-tuple frequency analysis: From intron/exon discrimination to T-cell epitope mappingPublished by Elsevier ,1990
- PREDICTION OF THE FREQUENCIES OF RESTRICTION ENDONUCLEASE RECOGNITION SEQUENCES USING DINUCLEOTIDE AND MONONUCLEOTIDE FREQUENCIES1988
- Restriction and modification enzymes and their recognition sequencesNucleic Acids Research, 1985
- A comprehensive set of sequence analysis programs for the VAXNucleic Acids Research, 1984
- Doublet frequencies in evolutionary distinct groupsNucleic Acids Research, 1984
- A Markov analysis of DNA sequencesJournal of Theoretical Biology, 1983
- Statistical characterization of nucleic acid sequence functional domainsNucleic Acids Research, 1983
- Characterization of translational initiation sites inE. coliNucleic Acids Research, 1982
- Efficient algorithms for folding and comparing nucleic acid sequencesNucleic Acids Research, 1982