‘‘Sequence space soup’’ of proteins and copolymers

Abstract
To study the protein folding problem, we use exhaustive computer enumeration to explore ‘‘sequence space soup,’’ an imaginary solution containing the ‘‘native’’ conformations (i.e., of lowest free energy) under folding conditions, of every possible copolymersequence. The model is of short self‐avoiding chains of hydrophobic (H) and polar (P) monomers configured on the two‐dimensional square lattice. By exhaustive enumeration, we identify all native structures for every possible sequence. We find that random sequences of H/P copolymers will bear striking resemblance to known proteins: Most sequences under folding conditions will be approximately as compact as known proteins, will have considerable amounts of secondary structure, and it is most probable that an arbitrary sequence will fold to a number of lowest free energy conformations that is of order one. In these respects, this simple model shows that proteinlike behavior should arise simply in copolymers in which one monomer type is highly solvent averse. It suggests that the structures and uniquenesses of native proteins are not consequences of having 20 different monomer types, or of unique properties of amino acidmonomers with regard to special packing or interactions, and thus that simple copolymers might be designable to collapse to proteinlike structures and properties. A good strategy for designing a sequence to have a minimum possible number of native states is to strategically insert many P monomers. Thus known proteins may be marginally stable due to a balance: More H residues stabilize the desired native state, but more P residues prevent simultaneous stabilization of undesired native states.