Optimal protein structure alignments by multiple linkage clustering: application to distantly related proteins

Abstract
A fully automatic procedure for aligning two protein structures is presented. It uses as sole structural similarity measure the root mean square (r.m.s.) deviation of superimposed backbone atoms (N, Cα C and O) and is designed to yield optimal solutions with respect to this measure. In a first step, the procedure identifies protein segments with similar conformations in both proteins. In a second step, a novel multiple linkage clustering algorithm is used to identify segment combinations which yield optimal global structure alignments. Several structure alignments can usually be obtained for a given pair of proteins, which are exploited here to define automatically the common structural core of a protein family. Furthermore, an automatic analysis of the clustering trees is described which enables detection of rigid-body movements between structure elements. To illustrate the performance of our procedure, we apply it to families of distantly related proteins. One groups the three α+β proteins ubiquitin, ferredoxin and the B1-domain of protein G. Their common structure motif consists of four β-strands and the only α-helix, with one strand and the helix being displaced as a rigid body relative to the remaining three β-strands. The other family consists of β-proteins from the Greek key group, in particular actinoxanthin, the immunoglobulin variable domain and plastocyanin. Their consensus motif, composed of five β-strands and a turn, is identified, mostly intact, in all Greek key proteins except the trypsins, and interestingly also in three other β-protein families, the lipocalins, the neuraminidases and the lectins. This result provides new insights into the evolutionary relationships in the very diverse group of all β-proteins.