An evaluation of the performance of an automated procedure for comparative modelling of protein tertiary structure

Abstract
A 3-D model of a protein can be constructed from its amino acid sequence and the 3-D structures of one or more homologues by annealing three sets of fragments: the structurally conserved regions, structurally variable regions and the side chains. The method encoded in the computer program COMPOSER was assessed by generating 3-D models of eight proteins whose crystal structures are already known and for which 3-D structures of homologues are available. In the structurally conserved regions, differences between modelled and X-ray structures are smaller than the differences between the X-ray structures of the modelled protein and the homologues used to build the model. When several homologues are used, the contributions of the known structures are weighted, preferably by the square of sequence similarity; this is especially important when the similarities of the homologues to the modelled structure differ greatly. The ‘collar’ extension approach, in which a similar region of different length in a homologue is used to extend the framework, can result in a more accurate model. If known homologues comprise more than one related group of proteins and they are both distantly related to the unknown, then alignment of the sequence to be modelled with each group of homologues facilitates identification of structurally conserved regions of the unknown and leads to an improved model. Models have root mean square differences (r.m.s.d.s) with the structures defined by X-ray analysis of between 0.73 and 1.56 Å for all Cα atoms, for seven of the eight models. For the model of mucor pepsin, where the closest homologue has 33% sequence identity and 20% of the residues are in structurally variable regions, the r.m.s.d. for the framework region is 1.71 Å and the r.m.s.d. for all Cα atoms is 3.47 Â.