Nucleotide Frequency Variation Across Human Genes
Open Access
- 1 December 2003
- journal article
- Published by Cold Spring Harbor Laboratory in Genome Research
- Vol. 13 (12) , 2594-2601
- https://doi.org/10.1101/gr.1317703
Abstract
The frequencies of individual nucleotides exhibit significant fluctuations across eukaryotic genes. In this paper, we investigate nucleotide variation across an averaged representation of all known human genes. Such a representation allows us to average out random fluctuations that constitute noise and uncover remarkable systematic trends in nucleotide distributions, particularly near boundaries between genetic elements—the promoter, exons, and introns. We propose that such variations result from differential mutational pressures and from the presence of specific regulatory motifs, such as transcription and splicing factor binding sites. Specifically, we observe significant GC and TA biases (excess of G over C and T over A) in noncoding regions of genes. Such biases are most probably caused by transcription-coupled mismatch repair, an effect that has recently been detected in mammalian genes. Subsequently, we examine the distribution of all hexanucleotides and identify motifs that are overrepresented within regulatory regions. By clustering and aligning such sequences, we recognize families of putative regulatory elements involved in exonic and intronic splicing control, and 3′ mRNA processing. Some of our motifs have been identified in prior theoretical and experimental studies, thus validating our approach, but we detect several novel sequences that we propose as candidates for future functional assays and mutation screens for genetic disorders.Keywords
This publication has 47 references indexed in Scilit:
- CpG Islands in vertebrate genomesPublished by Elsevier ,2004
- Upstream Elements Present in the 3′-Untranslated Region of Collagen Genes Influence the Processing Efficiency of Overlapping Polyadenylation SignalsJournal of Biological Chemistry, 2002
- Why are complementary DNA strands symmetric?Bioinformatics, 2002
- The Human Genome Browser at UCSCGenome Research, 2002
- Comprehensive analysis of CpG islands in human chromosomes 21 and 22Proceedings of the National Academy of Sciences, 2002
- Regulatory Functions of 3′UTRsBiochemical and Biophysical Research Communications, 2001
- Initial sequencing and analysis of the human genomeNature, 2001
- An Overview on the Distribution of Word Counts in Markov ChainsJournal of Computational Biology, 2000
- CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choiceNucleic Acids Research, 1994
- Sequence logos: a new way to display consensus sequencesNucleic Acids Research, 1990