Factors governing the foldability of proteins

Abstract
We use a three‐dimensional lattice model of proteins to investigate systematically the global properties of the polypeptide chains that determine the folding to the native conformation starting from an ensemble of denatured conformations. In the coarse‐grained description, the polypeptide chain is modeled as a heteropolymer consisting of N beads confined to the vertices of a simple cubic lattice. The interactions between the beads are taken from a random gaussian distribution of energies, with a mean value Bo < 0 that corresponds to the overall average hydrophobic interaction energy. We studied 56 sequences all with a unique ground state (native conformation) covering two values of N (15 and 27) and two values of Bo. The smaller value of |Bo| was chosen so that the average fraction of hydrophobic residues corresponds to that found in natural proteins. The higher value of |Bo| was selected with the expectation that only the fully compact conformations would contribute to the thermodynamic behavior. For N = 15 the entire conformation space (compact as well as noncompact structures) can be exhaustively enumerated so that the thermodynamic properties can be exactly computed at all temperatures. The thermodynamic properties for the 27‐mer chain were calculated using the slow cooling technique together with standard Monte Carlo simulations. The kinetics of approach to the native state for all the sequences was obtained using Monte Carlo simulations. For all sequences we find that there are two intrinsic characteristic temperatures, namely, Tθ and Tf. At the temperature Tθ the polypeptide chain makes a transition to a collapsed structure, while at Tf the chain undergoes a transition to the native conformation. We show that foldability of sequences can be characterized entirely in terms of these two temperatures. It is shown that fast folding sequences have small values of σ = (Tθ ‐ Tf)/Tθ whereas slow folders have larger values of σ (the range of σ is 0 < σ < 1). The calculated values of the folding times correlate extremely well with σ. An increase in σ from 0.1 to 0.7 can result in an increase of 5–6 orders of magnitudes in folding times. In contrast, we demonstrate that there is no useful correlation between folding times and the energy gap between the native conformation and the first excited state at any N for any value of Bo. In particular, in the parameter space of the model, many sequences with varying energy gaps, all with roughly the same folding time, can be easily engineered. Folding sequences in this model, therefore, can be classified based solely on the value of σ. Fast folders have small values of σ (typically less than about 0.1), moderate folders have values of σ in the approximate range between 0.1 and 0.6, while for slow folders σ exceeds 0.6. The precise boundary between these categories depends crucially on N and on the model. The range of σ for fast folders decreases with the length of the chain. At temperatures close to Tf fast folders reach the native conformation via a native conformation nucleation collapse mechanism without forming any detectable intermediates, whereas only a fraction of molecule ϕ(T) reaches the native conformation by this process for moderate folders. The remaining fraction reaches the native state via three‐stage multipathway process. For slow folders ϕ(T) is close to zero at all temperatures. The simultaneous requirement of native state stability and kinetic accessibility can be achieved at high enough temperatures for those sequences with small values of σ. The utility of these results for de novo design of proteins is briefly discussed. Proteins 26:411–441