Lineage-Specific Gene Expansions in Bacterial and Archaeal Genomes
Open Access
- 13 March 2001
- journal article
- letter
- Published by Cold Spring Harbor Laboratory in Genome Research
- Vol. 11 (4) , 555-565
- https://doi.org/10.1101/gr.gr-1660r
Abstract
Gene duplication is an important mechanistic antecedent to the evolution of new genes and novel biochemical functions. In an attempt to assess the contribution of gene duplication to genome evolution in archaea and bacteria, clusters of related genes that appear to have expanded subsequent to the diversification of the major prokaryotic lineages (lineage-specific expansions) were analyzed. Analysis of 21 completely sequenced prokaryotic genomes shows that lineage-specific expansions comprise a substantial fraction (∼5%–33%) of their coding capacities. A positive correlation exists between the fraction of the genes taken up by lineage-specific expansions and the total number of genes in a genome. Consistent with the notion that lineage-specific expansions are made up of relatively recently duplicated genes, >90% of the detected clusters consists of only two to four genes. The more common smaller clusters tend to include genes with higher pairwise similarity (as reflected by average score density) than larger clusters. Regardless of size, cluster members tend to be located more closely on bacterial chromosomes than expected by chance, which could reflect a history of tandem gene duplication. In addition to the small clusters, almost all genomes also contain rare large clusters of size ≥20. Several examples of the potential adaptive significance of these large clusters are explored. The presence or absence of clusters and their related genes was used as the basis for the construction of a similarity graph for completely sequenced prokaryotic genomes. The topology of the resulting graph seems to reflect a combined effect of common ancestry, horizontal transfer, and lineage-specific gene loss.Keywords
This publication has 46 references indexed in Scilit:
- Estimating the number of protein folds and families from complete genome data 1 1Edited by J. ThorntonJournal of Molecular Biology, 2000
- SMART: a web-based tool for the study of genetically mobile domainsNucleic Acids Research, 2000
- Estimating the number of protein foldsJournal of Molecular Biology, 1998
- Deciphering the biology of Mycobacterium tuberculosis from the complete genome sequenceNature, 1998
- A Genomic Perspective on Protein FamiliesScience, 1997
- Gapped BLAST and PSI-BLAST: a new generation of protein database search programsNucleic Acids Research, 1997
- Sequence Analysis of the Genome of the Unicellular Cyanobacterium Synechocystis sp. Strain PCC6803. II. Sequence Determination of the Entire Genome and Assignment of Potential Protein-coding RegionsDNA Research, 1996
- Gene duplications in H. influenzaeNature, 1995
- A comprehensive representation of extensive similarity linkage between large numbers of proteinsBioinformatics, 1995
- One thousand families for the molecular biologistNature, 1992