Triplet repeat length bias and variation in the human transcriptome
Open Access
- 6 October 2009
- journal article
- research article
- Published by Proceedings of the National Academy of Sciences in Proceedings of the National Academy of Sciences
- Vol. 106 (40) , 17095-17100
- https://doi.org/10.1073/pnas.0907112106
Abstract
Length variation in short tandem repeats (STRs) is an important family of DNA polymorphisms with numerous applications in genetics, medicine, forensics, and evolutionary analysis. Several major diseases have been associated with length variation of trinucleotide (triplet) repeats including Huntington's disease, hereditary ataxias and spinobulbar muscular atrophy. Using the reference human genome, we have catalogued all triplet repeats in genic regions. This data revealed a bias in noncoding DNA repeat lengths. It also enabled a survey of repeat-length polymorphisms (RLPs) in human genomes and a comparison of the rate of polymorphism in humans versus divergence from chimpanzee. For short repeats, this analysis of three human genomes reveals a relatively low RLP rate in exons and, somewhat surprisingly, in introns. All short RLPs observed in multiple genomes are biallelic (at least in this small sample). In contrast, long repeats are highly polymorphic and some long RLPs are multiallelic. For long repeats, the chimpanzee sequence frequently differs from all observed human alleles. This suggests a high expansion/contraction rate in all long repeats. Expansions and contractions are not, however, affected by natural selection discernable from our comparison of human-chimpanzee divergence with human RLPs. Our catalog of human triplet repeats and their surrounding flanking regions can be used to produce a cost-effective whole-genome assay to test individuals. This repeat assay could someday complement SNP arrays for producing tests that assess the risk of an individual to develop a disease, or become part of personalized genomic strategy that provides therapeutic guidance with respect to drug response.Keywords
This publication has 43 references indexed in Scilit:
- Systematic and integrative analysis of large gene lists using DAVID bioinformatics resourcesNature Protocols, 2008
- Aggressive assembly of pyrosequencing reads with matesBioinformatics, 2008
- Abundance and length of simple repeats in vertebrate genomes are determined by their structural propertiesGenome Research, 2008
- The complete genome of an individual by massively parallel DNA sequencingNature, 2008
- Direct selection of human genomic loci by microarray hybridizationNature Methods, 2007
- TRDB--The Tandem Repeats DatabaseNucleic Acids Research, 2007
- The Genetic Association DatabaseNature Genetics, 2004
- A map of human genome sequence variation containing 1.42 million single nucleotide polymorphismsNature, 2001
- Initial sequencing and analysis of the human genomeNature, 2001
- Adaptive protein evolution at the Adh locus in DrosophilaNature, 1991