Over-representation of the disease associated (CAG) and (CGG) repeats in the human genome

Expansion of trimer repeats has recently been described as a new type of human mutation. Of the 64 possible trimer compositions, only the CGG and CAG repeats have been implicated in genetic diseases. This study intends to address two questions: (1)What makes the CGG and CAG repeats unique? (2) Could other trimer repeats be involved in this type of mutation? By computer analysis of trimer and hexamer frequency distributions in approximately 10 Mb of human DNA, twenty trimer motifs (ten complementary pairs) have been identified that are the most likely to be expanded. The frequency distribution study also indicated that the expanded trimer motif in Fragile-X syndrome is GGC instead of CGG. DNA linguistics studies revealed that the GGC/GCC and CAG/CTG repeats were over-represented in the human genome. Further analysis of base composition suggested that the CCA/TGG repeats may be involved in the trimer expansion mutation since they possessed many similar characteristics to GGC/GCC and CAG/CTG. The computer aided sequence analysis studies reported here may help to understand the molecular mechanisms of trimer repeat expansion.