Abstract
The modified base 5-methylcytosine ((m)C) plays an important functional role in the biology of mammals as an epigenetic modification and appears to exert a striking impact on the molecular evolution of mammal genomes. The collective epigenetic functions of (m)C revolve around its effect on gene transcription, while the influence of this modified base on the evolution of mammal genomes derives from the greatly elevated spontaneous mutation rate of (m)C to T. In mammals, (m)C occurs at the dinucleotides CpG, CpA, and CpT. As a step toward a comprehensive statistical examination of the role of (m)C in mammal molecular evolution, we have developed novel Markov models of codon substitution that incorporate dinucleotide-level terms relevant to (m)C mutation. We apply these models to two data sets of aligned BRCA1 exon 11 sequences from bats and primates. In all cases, terms specific to mutations that affect the dinucleotides CpG, CpA, and CpT significantly improved model fit. For the CpG-specific terms, both transition and transversion substitution rates were elevated. These rates differed between the data sets. Bats exhibited a lower relative rate of substitutions at CpG-containing codons. Transition substitutions were significantly less than 1 at CpA-containing codons but greater than 1 at CpT-containing codons. The inclusion of interaction terms in the codon models to represent possible confounding with the effect of natural selection were supported for codons that contained CpG and CpT, but not CpA. From the results, we infer that mutation of (m)C is a probable factor that affects BRCA1 codons containing the dinucleotide CpG, a possible factor for CpA-containing codons, and an unlikely factor that affects CpT-containing codons. The confounding of estimated terms with the effect of natural selection indicate this confounding must be addressed for comparisons between different coding and noncoding regions.