Bacterial Genomes as New Gene Homes: The Genealogy of ORFans in E. coli
- 1 June 2004
- journal article
- Published by Cold Spring Harbor Laboratory in Genome Research
- Vol. 14 (6) , 1036-1042
- https://doi.org/10.1101/gr.2231904
Abstract
Differences in gene repertoire among bacterial genomes are usually ascribed to gene loss or to lateral gene transfer from unrelated cellular organisms. However, most bacteria contain large numbers of ORFans, that is, annotated genes that are restricted to a particular genome and that possess no known homologs. The uniqueness of ORFans within a genome has precluded the use of a comparative approach to examine their function and evolution. However, by identifying sequences unique to monophyletic groups at increasing phylogenetic depths, we can make direct comparisons of the characteristics of ORFans of different ages in the Escherichia coli genome, and establish their functional status and evolutionary rates. Relative to the genes ancestral to γ-Proteobacteria and to those genes distributed sporadically in other prokaryotic species, ORFans in the E. coli lineage are short, A+T rich, and evolve quickly. Moreover, most encode functional proteins. Based on these features, ORFans are not attributable to errors in gene annotation, limitations of current databases, or to failure of methods for detecting homology. Rather, ORFans in the genomes of free-living microorganisms apparently derive from bacteriophage and occasionally become established by assuming roles in key cellular functions.Keywords
This publication has 43 references indexed in Scilit:
- From Gene Trees to Organismal Phylogeny in Prokaryotes:The Case of the γ-ProteobacteriaPLoS Biology, 2003
- Analysis of singleton ORFans in fully sequenced microbial genomesProteins-Structure Function and Bioinformatics, 2003
- Complete Genome Sequence and Comparative Genomics ofShigella flexneriSerotype 2a Strain 2457TInfection and Immunity, 2003
- Twenty Thousand ORFan Microbial Protein Families for the Biologist?Structure, 2003
- 50 Million Years of Genomic Stasis in Endosymbiotic BacteriaScience, 2002
- Comparison of the genomes of two Xanthomonas pathogens with differing host specificitiesNature, 2002
- Genome sequence of Yersinia pestis, the causative agent of plagueNature, 2001
- Genomic sequences of bacteriophages HK97 and HK022: pervasive genetic mosaicism in the lambdoid bacteriophages 1 1Edited by M. GottesmanJournal of Molecular Biology, 2000
- The Complete Genome Sequence of Escherichia coli K-12Science, 1997
- Gapped BLAST and PSI-BLAST: a new generation of protein database search programsNucleic Acids Research, 1997