Improving the efficiency of dot-matrix similarity searches through use of an oligomer table
- 1 January 1986
- journal article
- Published by Oxford University Press (OUP) in Nucleic Acids Research
- Vol. 14 (1) , 597-610
- https://doi.org/10.1093/nar/14.1.597
Abstract
Dot-matrix sequence similarity searches can be greatly speeded up through use of a table listing all locations of short oligomers in one of the sequences to find potential similarities with a second sequence. The algorithm described finds similarities between two sequences of lengths M and N, comparing L residues at a time, with an efficiency of L X M X N/(SK) where S is the alphabet size, and k is the length of the oligomer. For nucleic acids, in which S = 4, use of a tetranucleotide table results in an efficiency of L X M X N/256. The simplicity of the approach allows for a straightforward calculation of the level of similarities expected to be found for given search parameters. Furthermore, the storage required is minimal, allowing for even large sequences to be compared on small microcomputers. Theoretical considerations regarding the use of this search are discussed.Keywords
This publication has 21 references indexed in Scilit:
- Recognition of protein coding regions in DNA sequencesNucleic Acids Research, 1982
- A high speed, high capacity homology matrix: zooming through SV40 and polyomaNucleic Acids Research, 1982
- Pattern recognition in nucleic acid sequences. I. A general method for finding local homologies and symmetriesNucleic Acids Research, 1982
- Portable microcomputer software for nucleotide sequence analysisNucleic Acids Research, 1982
- Enhanced graphic matrix analysis of nucleic acid and protein sequences.Proceedings of the National Academy of Sciences, 1981
- Base sequence studies of 300 nucleotide renatured repeated human DNA clonesJournal of Molecular Biology, 1981
- The nucleotide sequence of IS5 from Escherichia coliGene, 1981
- The nucleotide sequence and protein-coding capability of the transposable element IS5Gene, 1981
- The nucleotide sequence of the ubiquitous repetitive DNA sequence B1 complementary to the most abundant class of mouse fold-back RNANucleic Acids Research, 1980
- A general method applicable to the search for similarities in the amino acid sequence of two proteinsJournal of Molecular Biology, 1970