Improving the efficiency of dot-matrix similarity searches through use of an oligomer table

1 January 1986

journal article
Published by Oxford University Press (OUP) in Nucleic Acids Research

Vol. 14 (1) , 597-610
https://doi.org/10.1093/nar/14.1.597

Abstract

Dot-matrix sequence similarity searches can be greatly speeded up through use of a table listing all locations of short oligomers in one of the sequences to find potential similarities with a second sequence. The algorithm described finds similarities between two sequences of lengths M and N, comparing L residues at a time, with an efficiency of L X M X N/(SK) where S is the alphabet size, and k is the length of the oligomer. For nucleic acids, in which S = 4, use of a tetranucleotide table results in an efficiency of L X M X N/256. The simplicity of the approach allows for a straightforward calculation of the level of similarities expected to be found for given search parameters. Furthermore, the storage required is minimal, allowing for even large sequences to be compared on small microcomputers. Theoretical considerations regarding the use of this search are discussed.

Keywords

This publication has 21 references indexed in Scilit:

Recognition of protein coding regions in DNA sequences
Nucleic Acids Research, 1982
A high speed, high capacity homology matrix: zooming through SV40 and polyoma
Nucleic Acids Research, 1982
Pattern recognition in nucleic acid sequences. I. A general method for finding local homologies and symmetries
Nucleic Acids Research, 1982
Portable microcomputer software for nucleotide sequence analysis
Nucleic Acids Research, 1982
Enhanced graphic matrix analysis of nucleic acid and protein sequences.
Proceedings of the National Academy of Sciences, 1981
Base sequence studies of 300 nucleotide renatured repeated human DNA clones
Journal of Molecular Biology, 1981
The nucleotide sequence of IS5 from Escherichia coli
Gene, 1981
The nucleotide sequence and protein-coding capability of the transposable element IS5
Gene, 1981
The nucleotide sequence of the ubiquitous repetitive DNA sequence B1 complementary to the most abundant class of mouse fold-back RNA
Nucleic Acids Research, 1980
A general method applicable to the search for similarities in the amino acid sequence of two proteins
Journal of Molecular Biology, 1970