A test for the statistical significance of DNA sequence similarities for application in databank searches
- 1 April 1989
- journal article
- research article
- Published by Oxford University Press (OUP) in Bioinformatics
- Vol. 5 (2) , 123-131
- https://doi.org/10.1093/bioinformatics/5.2.123
Abstract
A method is developed, based on word-searching, which provides a rapid test for the statistical significance of DNA sequence similarities for use in databank searching. The method makes allowance for the lengths and dinucleotide compositions of the sequences being compared. A way is also described to calculate the power of the test, i.e. the probability of detecting a given similarity as being statistically significant. The effects on the power of the test of the scoring method, word length, sequence length, and sequence composition are examined. A novel scoring method is shown to be superior to the method currently used in most word-searching algorithms.This publication has 3 references indexed in Scilit:
- Improved tools for biological sequence comparison.Proceedings of the National Academy of Sciences, 1988
- Rapid similarity searches of nucleic acid and protein data banks.Proceedings of the National Academy of Sciences, 1983
- Random sequencesJournal of Molecular Biology, 1983