Statistical evaluation of the coding capacity of complementary DNA strands

Open Access

25 June 1984

journal article
research article
Published by Oxford University Press (OUP) in Nucleic Acids Research

Vol. 12 (12) , 5049-5059
https://doi.org/10.1093/nar/12.12.5049

Abstract

Two independent methods are used to evaluate the protein-coding information content in different classes of DNA sequences. The first method allows to evaluate the statistical relevance of finding unidentified reading frames, longer than 100 codons, on both DNA strands of: a) 117 DNA sequences that code for 142 nuclear proteins; b) 39 stable RNA coding sequences and c) 36 other DNA sequences which include regulatory and as yet unknown function sequences. The finding of 50 reading frames longer than 100 codons (complementary inverted proteins or c.i.p. genes) located on the DNA strand complementary to the protein-coding one is drastically in excess of the number predicted by chance alone. An independent method (testcode) applied to c.i.p. gene sequences, which assigns the probability of coding to a given sequence, predicts that more than 50% of these genes are translated in a functional product. These analyses indicate the existence of a new class of protein-coding genes, located on the DNA sequences complementary to the protein-coding DNA strand.

Keywords

This publication has 5 references indexed in Scilit:

Computer programs for the characterization of protein coding genes
Nucleic Acids Research, 1984
Nucleotide sequences from the adenovirus-2 genome.
Journal of Biological Chemistry, 1982
Recognition of protein coding regions in DNA sequences
Nucleic Acids Research, 1982
Expression of two proteins from overlapping and oppositely oriented genes on transposable DNA insertion element IS5
Nature, 1982
Coding capacity of complementary DNA strands
Nucleic Acids Research, 1981