Optimizing Nucleotide Mixtures to Encode Specific Subsets of Amino Acids for Semi-Random Mutagenesis

Abstract
In random mutagenesis, synthesis of an NNN triplet (i.e. equiprobable A, C, G, and T at each of the three positions in the codon) could be considered an optimal nucleotide mixture because all 20 amino acids are encoded. NN(G,C) might be considered a slightly more intelligent “dope” because the entire set of amino acids is still encoded using only half as many codons. Using a general algorithm described herein, it is possible to formulate more complex doping schemes which encode specific subsets of the twenty amino acids, excluding others from the mix. Maximizing the equiprobability of amino acid residues contributing to such a subset is suggested as an optimal basis for performing semi-random mutagenesis. This is important for reducing the nucleotide complexity of combinatorial cassettes so that “sequence space” can be searched more efficiently. Computer programs have been developed to provide tables of optimized dopes compatible with automated DNA synthesizers.