Analysis of the cDNAs of Hypothetical Genes on Arabidopsis Chromosome 2 Reveals Numerous Transcript Variants
Open Access
- 21 October 2005
- journal article
- Published by Oxford University Press (OUP) in Plant Physiology
- Vol. 139 (3) , 1323-1337
- https://doi.org/10.1104/pp.105.063479
Abstract
In the fully sequenced Arabidopsis (Arabidopsis thaliana) genome, many gene models are annotated as “hypothetical protein,” whose gene structures are predicted solely by computer algorithms with no support from either expressed sequence matches from Arabidopsis, or nucleic acid or protein homologs from other species. In order to confirm their existence and predicted gene structures, a high-throughput method of rapid amplification of cDNA ends (RACE) was used to obtain their cDNA sequences from 11 cDNA populations. Primers from all of the 797 hypothetical genes on chromosome 2 were designed, and, through 5′ and 3′ RACE, clones from 506 genes were sequenced and cDNA sequences from 399 target genes were recovered. The cDNA sequences were obtained by assembling their 5′ and 3′ RACE polymerase chain reaction products. These sequences revealed that (1) the structures of 151 hypothetical genes were different from their predictions; (2) 116 hypothetical genes had alternatively spliced transcripts and 187 genes displayed polyadenylation sites; and (3) there were transcripts arising from both strands, from the strand opposite to that of the prediction and possible dicistronic transcripts. Promoters from five randomly chosen hypothetical genes (At2g02540, At2g31270, At2g33640, At2g35550, and At2g36340) were cloned into report constructs, and their expressions are tissue or development stage specific. Our results indicate at least 50% of hypothetical genes on chromosome 2 are expressed in the cDNA populations with about 38% of the gene structures differing from their predictions. Thus, by using this targeted approach, high-throughput RACE, we revealed numerous transcripts including many uncharacterized variants from these hypothetical genes.Keywords
This publication has 78 references indexed in Scilit:
- DNA Replication Licensing Affects Cell Proliferation or Endoreplication in a Cell Type–Specific MannerPlant Cell, 2004
- Improving the Arabidopsis genome annotation using maximal transcript alignment assembliesNucleic Acids Research, 2003
- GeBP, the first member of a new gene family in Arabidopsis, encodes a nuclear protein with DNA‐binding activity and is regulated by KNAT1The Plant Journal, 2003
- Identification of a Soybean Protein That Interacts with GAGA Element Dinucleotide Repeat DNAPlant Physiology, 2002
- OTC and AUL1, two convergent and overlapping genes in the nuclear genome of Arabidopsis thalianaFEBS Letters, 1999
- COP1b, an isoform of COP1 generated by alternative splicing, has a negative effect on COP1 function in regulating light-dependent seedling development in ArabidopsisMolecular Genetics and Genomics, 1998
- Prediction of complete gene structures in human genomic DNAJournal of Molecular Biology, 1997
- Efficient gene expression in mammalian cells from a dicistronic transcriptional unit in an improved retroviral vectorGene, 1991
- The war of the whorls: genetic interactions controlling flower developmentNature, 1991
- A Revised Medium for Rapid Growth and Bio Assays with Tobacco Tissue CulturesPhysiologia Plantarum, 1962