GC/AT-content spikes as genomic punctuation marks
- 17 November 2004
- journal article
- Published by Proceedings of the National Academy of Sciences in Proceedings of the National Academy of Sciences
- Vol. 101 (48) , 16855-16860
- https://doi.org/10.1073/pnas.0407821101
Abstract
Large-scale analysis of the GC-content distribution at the gene level reveals both common features and basic differences in genomes of different groups of species. Sharp changes in GC content are detected at the transcription boundaries for all species analyzed, including human, mouse, rat, chicken, fruit fly, and worm. However, two substantially distinct groups of GC-content profiles can be recognized: warm-blooded vertebrates including human, mouse, rat, and chicken, and invertebrates including fruit fly and worm. In vertebrates, sharp positive and negative spikes of GC content are observed at the transcription start and stop sites, respectively, and there is also a progressive decrease in GC content from the 5' untranslated region to the 3' untranslated region along the gene. In invertebrates, the positive and negative GC-content spikes at the transcription start and stop sites are preceded by spikes of opposite value, and the highest GC content is found in the coding regions of the genes. Cross-correlation analysis indicates high frequencies of GC-content spikes at transcription start and stop sites. The strong conservation of this genomic feature seen in comparisons of the human/mouse and human/rat orthologs, and the clustering of genes with GC-content spikes on chromosomes imply a biological function. The GC-content spikes at transcription boundaries may reflect a general principle of genomic punctuation. Our analysis also provides means for identifying these GC-content spikes in individual genomic sequences.Keywords
This publication has 26 references indexed in Scilit:
- Isochore structures in the mouse genomeGenomics, 2004
- The Gene Ontology (GO) database and informatics resourceNucleic Acids Research, 2004
- Identification of isochore boundaries in the human genome using the technique of wavelet multiresolution analysisBiochemical and Biophysical Research Communications, 2003
- Identifying biological themes within lists of genes with EASEGenome Biology, 2003
- An isochore map of the human genome based on the Z curve methodGene, 2003
- GeneFizz: a web tool to compare genetic (coding/non-coding) and physical (helix/coil) segmentations of DNA sequences. Gene discovery and evolutionary perspectivesNucleic Acids Research, 2003
- DNA helix: the importance of being GC-richNucleic Acids Research, 2003
- Effects of GC Content and Mutational Pressure on the Lengths of Exons and Coding SequencesJournal of Molecular Evolution, 2003
- The UCSC Genome Browser DatabaseNucleic Acids Research, 2003
- Clustering of housekeeping genes provides a unified model of gene order in the human genomeNature Genetics, 2002