Protein Family Expansions and Biological Complexity

Abstract
During the course of evolution, new proteins are produced very largely as the result of gene duplication, divergence and, in many cases, combination. This means that proteins or protein domains belong to families or, in cases where their relationships can only be recognised on the basis of structure, superfamilies whose members descended from a common ancestor. The size of superfamilies can vary greatly. Also, during the course of evolution organisms of increasing complexity have arisen. In this paper we determine the identity of those superfamilies whose relative sizes in different organisms are highly correlated to the complexity of the organisms. As a measure of the complexity of 38 uni- and multicellular eukaryotes we took the number of different cell types of which they are composed. Of 1,219 superfamilies, there are 194 whose sizes in the 38 organisms are strongly correlated with the number of cell types in the organisms. We give outline descriptions of these superfamilies. Half are involved in extracellular processes or regulation and smaller proportions in other types of activity. Half of all superfamilies have no significant correlation with complexity. We also determined whether the expansions of large superfamilies correlate with each other. We found three large clusters of correlated expansions: one involves expansions in both vertebrates and plants, one just in vertebrates, and one just in plants. Our work identifies important protein families and provides one explanation of the discrepancy between the total number of genes and the apparent physiological complexity of eukaryotic organisms. One of the main goals in biology is to understand how complex organisms have evolved. Much of an organism's physiology, and hence complexity, is determined by its protein repertoire. The repertoire has been largely formed by the duplication, divergence, and combination of genes. This means that proteins can be grouped into families whose members are descended from a common ancestor. The authors have examined the sizes of 1,219 protein families in 38 eukaryotes of different complexity. Only a small fraction of protein families have expansions that are correlated with the number of cell types in the organisms. Half of these families are involved in regulation or extracellular processes. Other families do have expansions but in a lineage-specific manner. Thus, certain protein family expansions are “progressive” in that they lead to increases in biological complexity; other expansions are “conservative” in that they help an organism to adapt better to its environment, but do not increase its complexity. This means that there is no simple correlation between an organism's complexity and the number of its genes.