Alignment of 700 globin sequences: Extent of amino acid substitution and its correlation with variation in volume

Abstract
Seven‐hundred globin sequences, including 146 nonvertebrate sequences, were aligned on the basis of conservation of secondary structure and the avoidance of gap penalties. Of the 182 positions needed to accommodate all the globin sequences, only 84 are common to all, including the absolutely conserved PheCD1 and HisF8. The mean number of amino acid substitutions per position ranges from 8 to 13 for all globins and 5 to 9 for internal positions. Although the total sequence volumes have a variation ∼2–3%, the variation in volume per position ranges from ∼13% for the internal to ∼21% for the surface positions. Plausible correlations exist between amino acid substitution and the variation in volume per position for the 84 common and the internal but not the surface positions. The amino acid substitution matrix derived from the 84 common positions was used to evaluate sequence similarity within the globins and between the globins and phycocyanins C and colicins A, via calculation of pair‐wise similarity scores. The scores for globin‐globin comparisons over the 84 common positions overlap the globin‐phycocyanin and globin‐colicin scores, with the former being intermediate. For the subset of internal positions, overlap is minimal between the three groups of scores. These results imply a continuum of amino acid sequences able to assume the common three‐on‐three α‐helical structure and suggest that the determinants of the latter include sites other than those inaccessible to solvent.