Complete nucleotide sequence of SV40 DNA

Abstract
The determination of the total 5224 base-pair DNA sequence of the virus SV-40 gave the precise location of the known genes on the genome. At least 15.2% of the genome is presumably not translated into polypeptides. Particular points of interest revealed by the complete sequence are the initiation of the early t and T [tumor] antigens at the same position and the fact that the T antigen is coded by 2 non-contiguous regions of the genome; the T antigen mRNA is spliced in the coding region. In the late region the gene for the major protein VP1 overlaps those for proteins VP2 and VP3 over 122 nucleotides but is read in a different frame. The almost complete amino acid sequences of the 2 early proteins and those of the late proteins were deduced from the nucleotide sequence. The mRNA for the latter 3 proteins are presumably spliced out of a common primary RNA transcript. The use of degenerate codons is decidedly non-random, but is similar for the early and late regions. Codons of the type NUC, NCG and CGN [N = nucleotide base] are absent or very rare.