Mutations of Different Molecular Origins Exhibit Contrasting Patterns of Regional Substitution Rate Variation
Open Access
- 29 February 2008
- journal article
- research article
- Published by Public Library of Science (PLoS) in PLoS Computational Biology
- Vol. 4 (2) , e1000015
- https://doi.org/10.1371/journal.pcbi.1000015
Abstract
Transitions at CpG dinucleotides, referred to as “CpG substitutions”, are a major mutational input into vertebrate genomes and a leading cause of human genetic disease. The prevalence of CpG substitutions is due to their mutational origin, which is dependent on DNA methylation. In comparison, other single nucleotide substitutions (for example those occurring at GpC dinucleotides) mainly arise from errors during DNA replication. Here we analyzed high quality BAC-based data from human, chimpanzee, and baboon to investigate regional variation of CpG substitution rates. We show that CpG substitutions occur approximately 15 times more frequently than other single nucleotide substitutions in primate genomes, and that they exhibit substantial regional variation. Patterns of CpG rate variation are consistent with differences in methylation level and susceptibility to subsequent deamination. In particular, we propose a “distance-decaying” hypothesis, positing that due to the molecular mechanism of a CpG substitution, rates are correlated with the stability of double-stranded DNA surrounding each CpG dinucleotide, and the effect of local DNA stability may decrease with distance from the CpG dinucleotide. Consistent with our “distance-decaying” hypothesis, rates of CpG substitution are strongly (negatively) correlated with regional G+C content. The influence of G+C content decays as the distance from the target CpG site increases. We estimate that the influence of local G+C content extends up to 1,500∼2,000 bps centered on each CpG site. We also show that the distance-decaying relationship persisted when we controlled for the effect of long-range homogeneity of nucleotide composition. GpC sites, in contrast, do not exhibit such “distance-decaying” relationship. Our results highlight an example of the distinctive properties of methylation-dependent substitutions versus substitutions mostly arising from errors during DNA replication. Furthermore, the negative relationship between G+C content and CpG rates may provide an explanation for the observation that GC-rich SINEs show lower CpG rates than other repetitive elements. Mutations are raw materials of evolution. Earlier studies have shown that mutations occur at different frequencies in different genomic regions. By investigating the patterns and causes of such “regional” variation of mutations, we can better understand the mechanisms of underlying mutagenesis. In the human and other mammalian genomes, the most common type of mutation is caused by DNA methylation, which targets cytosines followed by guanine (CpG dinucleotides). Methylated cytosines are then subject to spontaneous deamination, which will cause a C to T (or G to A) transition (CpG substitution). Because this mutational process is unique to CpG substitutions, we reasoned that they might show different patterns of variability from other substitutions. Using high quality genomic sequences from primates and by separately analyzing variability of CpG substitutions and other substitutions, we demonstrate that CpG substitutions occur approximately 15 times more frequently than other substitutions, and show a distinctive pattern of regional variability. Particularly, we propose and provide evidence that because the deamination step requires temporary strand separation, G+C composition near 1,500–2,000 bps each direction from a target CpG affects the probability of a CpG substitution. Incorporating the difference in CpG and other substitutions discovered in this study will help build more realistic evolutionary models.Keywords
This publication has 49 references indexed in Scilit:
- Methylation-Dependent Transition Rates Are Dependent on Local Sequence Lengths and Genomic RegionsMolecular Biology and Evolution, 2006
- Heterogeneous Genomic Molecular Clocks in PrimatesPLoS Genetics, 2006
- CpG Islands in vertebrate genomesPublished by Elsevier ,2004
- Thermal denaturation of DNA molecules: A comparison of theory with experimentPublished by Elsevier ,2002
- Parallel Construction of Orthologous Sequence-Ready Clone Contig Maps in Multiple SpeciesGenome Research, 2002
- The Human Genome Browser at UCSCGenome Research, 2002
- Comprehensive analysis of CpG islands in human chromosomes 21 and 22Proceedings of the National Academy of Sciences, 2002
- Specific Alu Binding Protein from Human Sperm Chromatin Prevents DNA MethylationJournal of Biological Chemistry, 1995
- Cytosine deamination in mismatched base pairsBiochemistry, 1993
- DNA methylation and the frequency of CpG in animal DNANucleic Acids Research, 1980