Sequence specificity in CpG mutation hotspots

Abstract
CpG dinucleotides are efficiently methylated in vertebrate genomes except in the CpG islands having a high C+G content. Methylated CpGs are the single most mutated dinucleotide. Sequences surrounding disease causing CpG mutation sites were analyzed from locus-specific mutation databases. Both tetra- and heptanucleotide analyses indicated clear overall sequence preference for having pyrimidines 5' and purines 3' to the mutated 5-methylcytosine. The most mutated tetranucleotides are TCGA and TCGG, the former being also a frequent restriction and modification site. The results will help in elucidating the still controversial mutation mechanism of CpG doublets.