Abstract
Using only data on sequence, a method of computing a low-resolution tertiary structure of a protein is described. The steps are: (a) Estimate the distances of individual residues from the centroid of the molecule, using data on hydrophobicity and additional geometrical constraints. (b) Using these distances, construct a two-valued matrix whose elements, the distances between residues, are greater or less thanR, the radius of the molecule. (c) Optimize to obtain a three-dimensional structure. This procedure requires modest computing facilities and is applicable to proteins with 164 residues and presumably more. It produces structures withr (correlation between inter-residue distances in the computed and native structures) between 0.5 and 0.7. Furthermore, correct inference of two or three long-range contacts suffices to yield structures withr values of 0.8–0.9. Because segments forming parallel or antiparallel folding structures intersect the radius vector at similar angles, from centroidal point distances it is possible to infer some of these long-range contacts by an elaboration of the procedure used to construct the input matrix. A criterion is also described which can be used to determine the quality of a proposed input matrix even when the native structure is not known.