Protein Family Expansions and Biological Complexity
Open Access
- 26 May 2006
- journal article
- research article
- Published by Public Library of Science (PLoS) in PLoS Computational Biology
- Vol. 2 (5) , e48
- https://doi.org/10.1371/journal.pcbi.0020048
Abstract
During the course of evolution, new proteins are produced very largely as the result of gene duplication, divergence and, in many cases, combination. This means that proteins or protein domains belong to families or, in cases where their relationships can only be recognised on the basis of structure, superfamilies whose members descended from a common ancestor. The size of superfamilies can vary greatly. Also, during the course of evolution organisms of increasing complexity have arisen. In this paper we determine the identity of those superfamilies whose relative sizes in different organisms are highly correlated to the complexity of the organisms. As a measure of the complexity of 38 uni- and multicellular eukaryotes we took the number of different cell types of which they are composed. Of 1,219 superfamilies, there are 194 whose sizes in the 38 organisms are strongly correlated with the number of cell types in the organisms. We give outline descriptions of these superfamilies. Half are involved in extracellular processes or regulation and smaller proportions in other types of activity. Half of all superfamilies have no significant correlation with complexity. We also determined whether the expansions of large superfamilies correlate with each other. We found three large clusters of correlated expansions: one involves expansions in both vertebrates and plants, one just in vertebrates, and one just in plants. Our work identifies important protein families and provides one explanation of the discrepancy between the total number of genes and the apparent physiological complexity of eukaryotic organisms. One of the main goals in biology is to understand how complex organisms have evolved. Much of an organism's physiology, and hence complexity, is determined by its protein repertoire. The repertoire has been largely formed by the duplication, divergence, and combination of genes. This means that proteins can be grouped into families whose members are descended from a common ancestor. The authors have examined the sizes of 1,219 protein families in 38 eukaryotes of different complexity. Only a small fraction of protein families have expansions that are correlated with the number of cell types in the organisms. Half of these families are involved in regulation or extracellular processes. Other families do have expansions but in a lineage-specific manner. Thus, certain protein family expansions are “progressive” in that they lead to increases in biological complexity; other expansions are “conservative” in that they help an organism to adapt better to its environment, but do not increase its complexity. This means that there is no simple correlation between an organism's complexity and the number of its genes.Keywords
This publication has 48 references indexed in Scilit:
- Pfam: clans, web tools and servicesNucleic Acids Research, 2006
- Planting the Seeds of a New ParadigmPLoS Biology, 2004
- Assignment of homology to genome sequences using a library of hidden Markov models that represent all proteins of known structureJournal of Molecular Biology, 2001
- Domain combinations in archaeal, eubacterial and eukaryotic proteomesJournal of Molecular Biology, 2001
- Initial sequencing and analysis of the human genomeNature, 2001
- Origin of multicellular eukaryotes – insights from proteome comparisonsCurrent Opinion in Genetics & Development, 1999
- One thousand families for the molecular biologistNature, 1992
- Evolution of the proteases of blood coagulation and fibrinolysis by assembly from modulesCell, 1985
- Chemical and biological evolution of a nucleotide-binding proteinNature, 1974
- Structure and function of haemoglobinJournal of Molecular Biology, 1965