Knowledge-based potential defined for a rotamer library to design protein sequences

Abstract
A knowledge-based potential for a rotamer library was developed to design protein sequences. Protein side-chain conformations are represented by 56 templates. Each of their fitness to a given structural site-environment is evaluated by a combined function of the three knowledge-based terms, i.e. two-body side-chain packing, one-body hydration and local conformation. The number of matches between the native sequence and the structural site-environment in the database and that of the virtually settled mismatches, counted in advance, were transformed into the energy scores. In the best-14 test (assessment for the reproduction ability of the native rotamer on its structural site within a quarter of 56 fitness rank positions), the structural stability analysis on mutants of human and T4 lysozymes and the inverse-folding search by a structure profile against the sequence database, this function performs better than the function deduced with the conventional normalization and our previously developed function. Targeting various structural motifs, de novo sequence design was conducted with the function. The sequences thus obtained exhibit reasonable molecular masses and hydrophobic/hydrophilic patterns similar to the native sequences of the target and act as if they were the homologs to the target proteins in BLASTP search. This significant improvement is discussed in terms of the reference state for normalization and the crucial role of short-range repulsion to prohibit residue bumps.