Positive Selection of a Pre-Expansion CAG Repeat of the Human SCA2 Gene

Abstract
A region of approximately one megabase of human Chromosome 12 shows extensive linkage disequilibrium in Utah residents with ancestry from northern and western Europe. This strikingly large linkage disequilibrium block was analyzed with statistical and experimental methods to determine whether natural selection could be implicated in shaping the current genome structure. Extended Haplotype Homozygosity and Relative Extended Haplotype Homozygosity analyses on this region mapped a core region of the strongest conserved haplotype to the exon 1 of the Spinocerebellar ataxia type 2 gene (SCA2). Direct DNA sequencing of this region of the SCA2 gene revealed a significant association between a pre-expanded allele [(CAG)8CAA(CAG)4CAA(CAG)8] of CAG repeats within exon 1 and the selected haplotype of the SCA2 gene. A significantly negative Tajima's D value (−2.20, p < 0.01) on this site consistently suggested selection on the CAG repeat. This region was also investigated in the three other populations, none of which showed signs of selection. These results suggest that a recent positive selection of the pre-expansion SCA2 CAG repeat has occurred in Utah residents with European ancestry. Natural selection ultimately acts on the genetic variants existing among human populations. Therefore, there are “footprints” that the selective force has left behind in the human genome. In this study, Yu et al. identified an extremely large region on Chromosome 12 that is under positive selection in Utah residents with European ancestry by characterizing the correlation patterns of genomic variants. Further analyses on this interval suggested that selection centered on one of the many forms of Spinocerebellar ataxia type-2 (SCA2) gene. The selected form was next demonstrated to associate with one short version of the disease-causing CAG repeat in the SCA2 gene. These results suggest that the CAG repeat was positively selected. An abnormally long version of CAGs can cause SCA2, a neurodegenerative disease that severely impairs the abilities of body movement. The authors showed how they unraveled natural selection acting on the SCA2 gene. Their findings might lead to the discovery of the biological functions of this gene and its CAG repeat. This kind of study holds potential to facilitate the finding of common disease genes.