Statistical evaluation of the coding capacity of complementary DNA strands
Open Access
- 25 June 1984
- journal article
- research article
- Published by Oxford University Press (OUP) in Nucleic Acids Research
- Vol. 12 (12) , 5049-5059
- https://doi.org/10.1093/nar/12.12.5049
Abstract
Two independent methods are used to evaluate the protein-coding information content in different classes of DNA sequences. The first method allows to evaluate the statistical relevance of finding unidentified reading frames, longer than 100 codons, on both DNA strands of: a) 117 DNA sequences that code for 142 nuclear proteins; b) 39 stable RNA coding sequences and c) 36 other DNA sequences which include regulatory and as yet unknown function sequences. The finding of 50 reading frames longer than 100 codons (complementary inverted proteins or c.i.p. genes) located on the DNA strand complementary to the protein-coding one is drastically in excess of the number predicted by chance alone. An independent method (testcode) applied to c.i.p. gene sequences, which assigns the probability of coding to a given sequence, predicts that more than 50% of these genes are translated in a functional product. These analyses indicate the existence of a new class of protein-coding genes, located on the DNA sequences complementary to the protein-coding DNA strand.Keywords
This publication has 5 references indexed in Scilit:
- Computer programs for the characterization of protein coding genesNucleic Acids Research, 1984
- Nucleotide sequences from the adenovirus-2 genome.Journal of Biological Chemistry, 1982
- Recognition of protein coding regions in DNA sequencesNucleic Acids Research, 1982
- Expression of two proteins from overlapping and oppositely oriented genes on transposable DNA insertion element IS5Nature, 1982
- Coding capacity of complementary DNA strandsNucleic Acids Research, 1981