Multi-domain protein families and domain pairs: comparison with known structures and a random model of domain recombination
- 1 January 2003
- journal article
- research article
- Published by Springer Nature in Journal of Structural and Functional Genomics
- Vol. 4 (2/3) , 67-78
- https://doi.org/10.1023/a:1026113408773
Abstract
There is a limited repertoire of domain families in nature that are duplicated and combined in different ways to form the set of proteins in a genome. Most proteins in both prokaryote and eukaryote genomes consist of two or more domains, and we show that the family size distribution of multi-domain protein families follows a power law like that of individual families. Most domain pairs occur in four to six different domain architectures: in isolation and in combinations with different partners. We showed previously that within the set of all pairwise domain combinations, most small and medium-sized families are observed in combination with one or two other families, while a few large families are very versatile and combine with many different partners. Though this may appear to be a stochastic pattern, in which large families have more combination partners by virtue of their size, we establish here that all the domain families with more than three members in genomes are duplicated more frequently than would be expected by chance considering their number of neighbouring domains. This duplication of domain pairs is statistically significant for between one and three quarters of all families with seven or more members. For the majority of pairwise domain combinations, there is no known three-dimensional structure of the two domains together, and we term these novel combinations. Novel domain combinations are interesting and important targets for structural elucidation, as the geometry and interaction between the domains will help understand the function and evolution of multi-domain proteins. Of particular interest are those combinations that occur in the largest number of multi-domain proteins, and several of these frequent novel combinations contain DNA-binding domains. Abbreviations: SCOP: Structural Classification of Proteins database, PDB: Protein DataBank, HMM: hidden Markov modelKeywords
This publication has 30 references indexed in Scilit:
- CDART: Protein Homology by Domain ArchitectureGenome Research, 2002
- The Protein Data BankActa Crystallographica Section D-Biological Crystallography, 2002
- Interrogating protein interaction networks through structural biologyProceedings of the National Academy of Sciences, 2002
- SUPERFAMILY: HMMs representing all proteins of known structure. SCOP sequence searches, alignments and genome assignmentsNucleic Acids Research, 2002
- SCOP database in 2002: refinements accommodate structural genomicsNucleic Acids Research, 2002
- Comparing function and structure between entire proteomesProtein Science, 2001
- Annotation Transfer for Genomics: Measuring Functional Divergence in Multi-Domain ProteinsGenome Research, 2001
- Structural genomics: an overviewProgress in Biophysics and Molecular Biology, 2000
- Advances in structural genomicsCurrent Opinion in Structural Biology, 1999
- An X-ray diffraction study of inhibited derivatives of α-chymotrypsinJournal of Molecular Biology, 1966