A test for the statistical significance of DNA sequence similarities for application in databank searches

1 April 1989

journal article
research article
Published by Oxford University Press (OUP) in Bioinformatics

Vol. 5 (2) , 123-131
https://doi.org/10.1093/bioinformatics/5.2.123

Abstract

A method is developed, based on word-searching, which provides a rapid test for the statistical significance of DNA sequence similarities for use in databank searching. The method makes allowance for the lengths and dinucleotide compositions of the sequences being compared. A way is also described to calculate the power of the test, i.e. the probability of detecting a given similarity as being statistically significant. The effects on the power of the test of the scoring method, word length, sequence length, and sequence composition are examined. A novel scoring method is shown to be superior to the method currently used in most word-searching algorithms.

This publication has 3 references indexed in Scilit:

Improved tools for biological sequence comparison.
Proceedings of the National Academy of Sciences, 1988
Rapid similarity searches of nucleic acid and protein data banks.
Proceedings of the National Academy of Sciences, 1983
Random sequences
Journal of Molecular Biology, 1983