An Efficient Statistic to Detect Over- and Under-represented Words in DNA Sequences
- 1 January 1997
- journal article
- research article
- Published by Mary Ann Liebert Inc in Journal of Computational Biology
- Vol. 4 (2) , 189-192
- https://doi.org/10.1089/cmb.1997.4.189
Abstract
In this note, we point out a very efficient statistic to detect over- and under-represented words in DNA sequences, when Markov chain models are used to represent the sequences. This statistic is missing from the recent review done on this important problem and appears to be a better measure of rarity and abundance of words in DNA sequences. Key words: DNA sequences, word counts, unexpected frequencies, Gaussian approximation, Markov chains.Keywords
This publication has 3 references indexed in Scilit:
- Over- and Underrepresentation of Short DNA Words in Herpesvirus GenomesJournal of Computational Biology, 1996
- Exceptional Motifs in Different Markov Chain Models for a Statistical Analysis of DNA SequencesJournal of Computational Biology, 1995
- Finding Words with Unexpected Frequencies in Deoxyribonucleic Acid SequencesJournal of the Royal Statistical Society Series B: Statistical Methodology, 1995