Retroposed Copies of the HMG Genes: A Window to Genome Dynamics
Open Access
- 1 May 2003
- journal article
- Published by Cold Spring Harbor Laboratory in Genome Research
- Vol. 13 (5) , 800-812
- https://doi.org/10.1101/gr.893803
Abstract
Retroposed copies (RPCs) of genes are functional (intronless paralogs) or nonfunctional (processed pseudogenes) copies derived from mRNA through a process of retrotransposition. Previous studies found that gene families involved in mRNA translation or nuclear function were more likely to have large numbers of RPCs. Here we characterize RPCs of the few families coding for the abundant high-mobility-group (HMG) proteins in humans. Using an algorithm we developed, we identified and studied 219 HMG RPCs. For slightly more than 10% of these RPCs, we found evidence indicating expression. Furthermore, eight of these are potentially new members of the HMG families of proteins. For three RPCs, the evidence indicated expression as part of other transcripts; in all of these, we found the presence of alternative splicing or multiple polyadenylation signals. RPC distribution among the HMGs was not even, with 33–65 each for HMGB1, HMGB3, HMGN1, and HMGN2, and 0–6 each for HMGA1, HMGA2, HMGB2, and HMGN3. Analysis of the sequences flanking the RPCs revealed that the junction between the target site duplications and the 5′-flanking sequences exhibited the same TT/AAAA consensus found for the L1 endonuclease, supporting an L1-mediated retrotransposition mechanism. Finally, because our algorithm included aligning RPC flanking sequences with the corresponding HMG genomic sequence, we were able to identify transcribed regions of HMG genes that were not part of the published mRNA sequences.Keywords
This publication has 47 references indexed in Scilit:
- Identification and Analysis of Over 2000 Ribosomal Protein Pseudogenes in the Human GenomeGenome Research, 2002
- Molecular Fossils in the Human Genome: Identification and Analysis of the Pseudogenes in Chromosomes 21 and 22Genome Research, 2002
- Initial sequencing and analysis of the human genomeNature, 2001
- A Greedy Algorithm for Aligning DNA SequencesJournal of Computational Biology, 2000
- BLAST 2 Sequences, a new tool for comparing protein and nucleotide sequencesFEMS Microbiology Letters, 1999
- CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choiceNucleic Acids Research, 1994
- Chromosomal localization of the murine gene and two related sequences encoding high-mobility-group I and Y proteinsGenomics, 1992
- Basic Local Alignment Search ToolJournal of Molecular Biology, 1990
- Basic local alignment search toolJournal of Molecular Biology, 1990
- Sequence logos: a new way to display consensus sequencesNucleic Acids Research, 1990