Evolutionary Analysis by Whole-Genome Comparisons
Open Access
- 15 April 2002
- journal article
- research article
- Published by American Society for Microbiology in Journal of Bacteriology
- Vol. 184 (8) , 2260-2272
- https://doi.org/10.1128/jb.184.8.2260-2272.2002
Abstract
A total of 37 complete genome sequences of bacteria, archaea, and eukaryotes were compared. The percentage of orthologous genes of each species contained within any of the other 36 genomes was established. In addition, the mean identity of the orthologs was calculated. Several conclusions result: (i) a greater absolute number of orthologs of a given species is found in larger species than in smaller ones; (ii) a greater percentage of the orthologous genes of smaller genomes is contained in other species than is the case for larger genomes, which corresponds to a larger proportion of essential genes; (iii) before species can be specifically related to one another in terms of gene content, it is first necessary to correct for the size of the genome; (iv) eukaryotes have a significantly smaller percentage of bacterial orthologs after correction for genome size, which is consistent with their placement in a separate domain; (v) the archaebacteria are specifically related to one another but are not significantly different in gene content from the bacteria as a whole; (vi) determination of the mean identity of all orthologs (involving hundreds of gene comparisons per genome pair) reduces the impact of errors in misidentification of orthologs and to misalignments, and thus it is far more reliable than single gene comparisons; (vii) however, there is a maximum amount of change in protein sequences of 37% mean identity, which limits the use of percentage sequence identity to the lower taxa, a result which should also be true for single gene comparisons of both proteins and rRNA; (viii) most of the species that appear to be specifically related based upon gene content also appear to be specifically related based upon the mean identity of orthologs; (ix) the genes of a majority of species considered in this study have diverged too much to allow the construction of all-encompassing evolutionary trees. However, we have shown that eight species of gram-negative bacteria, six species of gram-positive bacteria, and eight species of archaebacteria are specifically related in terms of gene content, mean identity of orthologs, or both.Keywords
This publication has 82 references indexed in Scilit:
- Uprooting the Tree of LifeScientific American, 2000
- C 1 Transfer Enzymes and Coenzymes Linking Methylotrophic Bacteria and Methanogenic ArchaeaScience, 1998
- Deciphering the biology of Mycobacterium tuberculosis from the complete genome sequenceNature, 1998
- Genome Data Shake Tree of LifeScience, 1998
- The Complete Genome Sequence of Escherichia coli K-12Science, 1997
- I. Basic and applied S-layer research: an overviewFEMS Microbiology Reviews, 1997
- Sequence Analysis of the Genome of the Unicellular Cyanobacterium Synechocystis sp. Strain PCC6803. II. Sequence Determination of the Entire Genome and Assignment of Potential Protein-coding RegionsDNA Research, 1996
- Basic local alignment search toolJournal of Molecular Biology, 1990
- General methods of sequence comparisonBulletin of Mathematical Biology, 1984
- Molecules as documents of evolutionary historyJournal of Theoretical Biology, 1965