Application of tetranucleotide frequencies for the assignment of genomic fragments
- 11 August 2004
- journal article
- research article
- Published by Wiley in Environmental Microbiology
- Vol. 6 (9) , 938-947
- https://doi.org/10.1111/j.1462-2920.2004.00624.x
Abstract
Summary: A basic problem of the metagenomic approach in microbial ecology is the assignment of genomic fragments to a certain species or taxonomic group, when suitable marker genes are absent. Currently, the (G + C)‐content together with phylogenetic information and codon adaptation for functional genes is mostly used to assess the relationship of different fragments. These methods, however, can produce ambiguous results. In order to evaluate sequence‐based methods for fragment identification, we extensively compared (G + C)‐contents and tetranucleotide usage patterns of 9054 fosmid‐sized genomic fragments generated in silico from 118 completely sequenced bacterial genomes (40 982 931 fragment pairs were compared in total). The results of this systematic study show that the discriminatory power of correlations of tetranucleotide‐derived z‐scores is by far superior to that of differences in (G + C)‐content and provides reasonable assignment probabilities when applied to metagenome libraries of small diversity. Using six fully sequenced fosmid inserts from a metagenomic analysis of microbial consortia mediating the anaerobic oxidation of methane (AOM), we demonstrate that discrimination based on tetranucleotide‐derived z‐score correlations was consistent with corresponding data from 16S ribosomal RNA sequence analysis and allowed us to discriminate between fosmid inserts that were indistinguishable with respect to their (G + C)‐contents.Keywords
This publication has 26 references indexed in Scilit:
- Informatics for Unveiling Hidden Genome SignaturesGenome Research, 2003
- Evolutionary Implications of Microbial Genome Tetranucleotide Frequency BiasesGenome Research, 2003
- Microbial Reefs in the Black Sea Fueled by Anaerobic Oxidation of MethaneScience, 2002
- Capturing Whole-Genome Characteristics in Short Sequences Using a Naïve Bayesian ClassifierGenome Research, 2001
- Proteorhodopsin phototrophy in the oceanNature, 2001
- Genome-Scale Compositional Comparisons in EukaryotesGenome Research, 2001
- Bacterial Rhodopsin: Evidence for a New Type of Phototrophy in the SeaScience, 2000
- COMPARATIVE DNA ANALYSIS ACROSS DIVERSE GENOMESAnnual Review of Genetics, 1998
- Dinucleotide relative abundance extremes: a genomic signatureTrends in Genetics, 1995
- Exceptional Motifs in Different Markov Chain Models for a Statistical Analysis of DNA SequencesJournal of Computational Biology, 1995