The disparate nature of “intergenic” polyadenylation sites
- 24 August 2006
- journal article
- research article
- Published by Cold Spring Harbor Laboratory in RNA
- Vol. 12 (10) , 1794-1801
- https://doi.org/10.1261/rna.136206
Abstract
The termination of mature eukaryotic mRNAs occurs at specific polyadenylation sites located downstream from stop codons in the 3′-untranslated region (UTR). An accurate delineation of these sites is essential for the study of 3′-UTR-based gene regulation and for the design of pertinent probes for transcriptome analysis. Although typical poly(A) sites are located between 0 and 2 kb from the stop codon, EST sequence analyses have identified sites located at unexpectedly long ranges (5–10 kb) in a number of genes. Here we perform a complete mapping of EST and full-length cDNA sequences on the mouse and human genome to observe putative poly(A) sites extending beyond annotated 3′-ends and into the intergenic regions. We introduce several quality parameters for poly(A) site prediction and train a classification tree to associate P-values to predicted sites. We observe a higher than background level of high-scoring sites up to 12–15 kb past the stop codon, both in human and mouse. This leads to an estimate of about 5000 human genes having unreported 3′-end extensions and about 3500 novel polyadenylated transcripts lying in present “intergenic” regions. These high-scoring, long-range poly(A) sites corresponding to novel transcripts and gene extensions should be incorporated into current human and mouse gene repositories.Keywords
This publication has 33 references indexed in Scilit:
- SmcHD1, containing a structural-maintenance-of-chromosomes hinge domain, has a critical role in X inactivationNature Genetics, 2008
- Correction: Human MicroRNA TargetsPLoS Biology, 2005
- Cryptic Pol II Transcripts Are Degraded by a Nuclear Quality Control Pathway Involving a New Poly(A) PolymeraseCell, 2005
- Conserved Seed Pairing, Often Flanked by Adenosines, Indicates that Thousands of Human Genes are MicroRNA TargetsCell, 2005
- NCBI Reference Sequence (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteinsNucleic Acids Research, 2004
- Human MicroRNA TargetsPLoS Biology, 2004
- Correction: Integrative Annotation of 21,037 Human Genes Validated by Full-Length cDNA ClonesPLoS Biology, 2004
- Integrative Annotation of 21,037 Human Genes Validated by Full-Length cDNA ClonesPLoS Biology, 2004
- An Overview of EnsemblGenome Research, 2004
- Balanced-Size and Long-Size Cloning of Full-Length, Cap-Trapped cDNAs into Vectors of the Novel λ-FLC Family Allows Enhanced Gene Discovery Rate and Functional AnalysisGenomics, 2001