Signature Genes as a Phylogenomic Tool
Open Access
- 23 April 2008
- journal article
- research article
- Published by Oxford University Press (OUP) in Molecular Biology and Evolution
- Vol. 25 (8) , 1659-1667
- https://doi.org/10.1093/molbev/msn115
Abstract
Gene content has been shown to contain a strong phylogenetic signal, yet its usage for phylogenetic questions is hampered by horizontal gene transfer and parallel gene loss and until now required completely sequenced genomes. Here, we introduce an approach that allows the phylogenetic signal in gene content to be applied to any set of sequences, using signature genes for phylogenetic classification. The hundreds of publicly available genomes allow us to identify signature genes at various taxonomic depths, and we show how the presence of signature genes in an unspecified sample can be used to characterize its taxonomic composition. We identify 8,362 signature genes specific for 112 prokaryotic taxa. We show that these signature genes can be used to address phylogenetic questions on the basis of gene content in cases where classic gene content or sequence analyses provide an ambiguous answer, such as for Nanoarchaeum equitans, and even in cases where complete genomes are not available, such as for metagenomics data. Cross-validation experiments leaving out up to 30% of the species show that ∼92% of the signature genes correctly place the species in a related clade. Analyses of metagenomics data sets with the signature gene approach are in good agreement with the previously reported species distributions based on phylogenetic analysis of marker genes. Summarizing, signature genes can complement traditional sequence-based methods in addressing taxonomic questions.Keywords
This publication has 34 references indexed in Scilit:
- Signature, a web server for taxonomic characterization of sequence samples using signature genesNucleic Acids Research, 2008
- The Sorcerer II Global Ocean Sampling Expedition: Northwest Atlantic through Eastern Tropical PacificPLoS Biology, 2007
- The Sorcerer II Global Ocean Sampling Expedition: Expanding the Universe of Protein FamiliesPLoS Biology, 2007
- Phylogenomic analysis of proteins that are distinctive of Archaea and its main subgroups and the origin of methanogenesisBMC Genomics, 2007
- STRING 7--recent developments in the integration and prediction of protein interactionsNucleic Acids Research, 2006
- Signature proteins that are distinctive characteristics of Actinobacteria and their subgroupsAntonie van Leeuwenhoek, 2006
- Toward Automatic Reconstruction of a Highly Resolved Tree of LifeScience, 2006
- Uncovering the overlapping community structure of complex networks in nature and societyNature, 2005
- Lateral gene transfer in eukaryotesCellular and Molecular Life Sciences, 2005
- MUSCLE: multiple sequence alignment with high accuracy and high throughputNucleic Acids Research, 2004