Rapid and accurate estimates of statistical significance for sequence data base searches.
- 24 May 1994
- journal article
- Published by Proceedings of the National Academy of Sciences in Proceedings of the National Academy of Sciences
- Vol. 91 (11) , 4625-4628
- https://doi.org/10.1073/pnas.91.11.4625
Abstract
A central question in sequence comparison is the statistical significance of an observed similarity. For local alignment containing gaps to optimize sequence similarity this problem has so far not been solved mathematically. Using as a basis the Chen-Stein theory of Poisson approximation, we present a practical method to approximate the probability that a local alignment score is a result of chance alone. For a set of similarity scores and gap penalties only one simulation of random alignments needs to be calculated to derive the key information allowing us to estimate the significance of any alignment calculated under this setting. We present applications to data base searching and the analysis of pairwise and self-comparisons of proteins.Keywords
This publication has 17 references indexed in Scilit:
- Identification of common molecular subsequencesPublished by Elsevier ,2004
- An improved algorithm for matching biological sequencesPublished by Elsevier ,2004
- Sequence alignment and penalty choiceJournal of Molecular Biology, 1994
- Oligopeptide biases in protein sequences and their use in predicting protein coding regions in nucleotide sequencesProteins-Structure Function and Bioinformatics, 1988
- A new algorithm for best subsequence alignments with application to tRNA-rRNA comparisonsJournal of Molecular Biology, 1987
- Rapid and Sensitive Protein Similarity SearchesScience, 1985
- Simian Sarcoma Virus onc Gene, v- sis , Is Derived from the Gene (or Genes) Encoding a Platelet-Derived Growth FactorScience, 1983
- Rapid similarity searches of nucleic acid and protein data banks.Proceedings of the National Academy of Sciences, 1983
- [47] Establishing homologies in protein sequencesPublished by Elsevier ,1983
- Viral src gene products are related to the catalytic chain of mammalian cAMP-dependent protein kinase.Proceedings of the National Academy of Sciences, 1982