On the statistical assessment of similarities in DNA sequences
Open Access
- 1 January 1984
- journal article
- Published by Oxford University Press (OUP) in Nucleic Acids Research
- Vol. 12 (13) , 5529-5543
- https://doi.org/10.1093/nar/12.13.5529
Abstract
The statistical behavior of the similarity score for unrelated DNA sequences calculated as letter-by-letter comparison or from various forms of optimal alignment was studied. It was found that natural DNA-sequences from a data base and true random sequences show the same statistical behavior in terms of such scores. This makes it possible to adopt a simple criterion for the rejection of fortuitous similarity. It is based on the mean and standard deviation of chance scores whose expected values, depending on chain length, gap penalty and probability of letter coincidence, may be calculated from formulae given in the paper.Keywords
This publication has 9 references indexed in Scilit:
- Optimal sequence alignmentsProceedings of the National Academy of Sciences, 1983
- Rapid similarity searches of nucleic acid and protein data banks.Proceedings of the National Academy of Sciences, 1983
- An interactive graphics program for comparing and aligning nucleic acid and amino acid sequencesNucleic Acids Research, 1982
- Pattern recognition in nucleic acid sequences. I. A general method for finding local homologies and symmetriesNucleic Acids Research, 1982
- The sequence of human serum albumin cDNA and its expression in E. coliNucleic Acids Research, 1981
- A surprising new protein superfamily containing ovalbumin, antithrombin-III, and alpha1-proteinase inhibitorBiochemical and Biophysical Research Communications, 1980
- A test for nucleotide sequence homologyJournal of Molecular Biology, 1973
- Matching Sequences under Deletion/Insertion ConstraintsProceedings of the National Academy of Sciences, 1972
- A general method applicable to the search for similarities in the amino acid sequence of two proteinsJournal of Molecular Biology, 1970