Abstract
I show that the recognition sequences of Type II restriction systems are correlated with the G+C content of the host bacterial DNA. Almost all restriction systems with G+C rich tetranucleotide recognition sequences are found in species with A+T rich genomes, whereas G+C rich hexanucleotide and octanucleotide recognition sequences are found almost exclusively in species with G+C rich genomes. Most hexanucleotide recognition sequences found in species with A+T rich genomes are A+T rich. This distribution eliminates a substantial proportion of the potential variance in the frequency of restriction recognition sequences in the host genomes. As a consequence, almost all restriction recognition sequences, including those eight base pairs in length (Not I and Sfi I), are predicted to occur with a frequency ranging from once every 300 to once every 5,000 base pairs in the host genome. Since the G+C content of bacteriophage DNA and of the host genome are also correlated, the data presented is evidence that most Type II "restriction systems" are indeed involved in phage restriction.