The dominance of the population by a selected few: power-law behaviour applies to a wide variety of genomic properties
Open Access
- 25 July 2002
- journal article
- research article
- Published by Springer Nature in Genome Biology
- Vol. 3 (8) , 1-7
- https://doi.org/10.1186/gb-2002-3-8-research0040
Abstract
The sequencing of genomes provides us with an inventory of the 'molecular parts' in nature, such as protein families and folds, and their functions in living organisms. Through the analysis of such inventories, it has been shown that different genomes have very different usage of parts; for example, the common folds in the worm are very different from those in Escherichia coli. Despite these differences, we find that the genomic occurrence of generalized parts follows a well-known mathematical framework called the power law, with a few parts occurring many times and most occurring only a few times. This observation is true in a wide variety of genomic contexts. Earlier studies found power laws in a few specific cases, such as the occurrence of protein families. Here, we find many further cases of power-law behavior, for example in the occurrence of pseudogenes and in levels of gene expression. We show comprehensively that this behavior applies across many different genomes, for many different types of parts (DNA words, InterPro families, protein superfamilies and folds, pseudogene families and pseudomotifs), and for the many disparate attributes associated with these parts (their functions, interactions and expression levels). Power-law behavior provides a concise mathematical description of an important biological feature: the sheer dominance of a few members over the overall population. We present this behavior in a unified framework and propose that all these observations are connected to an underlying DNA duplication process as genomes evolved to their current state.Keywords
This publication has 39 references indexed in Scilit:
- Molecular Fossils in the Human Genome: Identification and Analysis of the Pseudogenes in Chromosomes 21 and 22Genome Research, 2002
- Protein family and fold occurrence in genomes: power-law behaviour and evolutionary modelJournal of Molecular Biology, 2001
- Assignment of homology to genome sequences using a library of hidden Markov models that represent all proteins of known structureJournal of Molecular Biology, 2001
- Mapping protein family interactions: intramolecular and intermolecular protein family interaction repertoires in the PDB and yeast11Edited by J. KarnJournal of Molecular Biology, 2001
- The relationship between protein structure and function: a comprehensive survey with application to the yeast genomeJournal of Molecular Biology, 1999
- Stretched exponential distributions in nature and economy: “fat tails” with characteristic scalesZeitschrift für Physik B Condensed Matter, 1998
- A structural census of genomes: comparing bacterial, eukaryotic, and archaeal genomes in terms of protein structureJournal of Molecular Biology, 1997
- Convergent Multiplicative Processes Repelled from Zero: Power Laws and Truncated Power LawsJournal de Physique I, 1997
- Zipf’s law, the central limit theorem, and the random division of the unit intervalPhysical Review E, 1996
- Explaining "Linguistic Features" of Noncoding DNAScience, 1996