Refined Annotation of the Arabidopsis Genome by Complete Expressed Sequence Tag Mapping
Open Access
- 1 June 2003
- journal article
- Published by Oxford University Press (OUP) in Plant Physiology
- Vol. 132 (2) , 469-484
- https://doi.org/10.1104/pp.102.018101
Abstract
Expressed sequence tags (ESTs) currently encompass more entries in the public databases than any other form of sequence data. Thus, EST data sets provide a vast resource for gene identification and expression profiling. We have mapped the complete set of 176,915 publicly available Arabidopsis EST sequences onto the Arabidopsis genome using GeneSeqer, a spliced alignment program incorporating sequence similarity and splice site scoring. About 96% of the available ESTs could be properly aligned with a genomic locus, with the remaining ESTs deriving from organelle genomes and non-Arabidopsis sources or displaying insufficient sequence quality for alignment. The mapping provides verified sets of EST clusters for evaluation of EST clustering programs. Analysis of the spliced alignments suggests corrections to current gene structure annotation and provides examples of alternative and non-canonical pre-mRNA splicing. All results of this study were parsed into a database and are accessible via a flexible Web interface at http://www.plantgdb.org/AtGDB/.Keywords
This publication has 44 references indexed in Scilit:
- Comparison of RNA Expression Profiles Based on Maize Expressed Sequence Tag Frequency Analysis and Micro-Array HybridizationPlant Physiology, 2002
- SpliceNest: visualizing gene structure and alternative splicing based on EST clustersTrends in Genetics, 2002
- Alternative splicing and genome complexityNature Genetics, 2001
- Computational Inference of Homologous Gene Structures in the Human GenomeGenome Research, 2001
- Gene Structure Prediction and Alternative Splicing Analysis Using Genomically Aligned ESTsGenome Research, 2001
- Gene structure prediction by spliced alignment of genomic DNA with protein sequences: increased accuracy by differential splice site scoringJournal of Molecular Biology, 2000
- Evaluation of gene prediction software using a genomic data set: application to Arabidopsis thalianasequencesBioinformatics, 1999
- Gapped BLAST and PSI-BLAST: a new generation of protein database search programsNucleic Acids Research, 1997
- Non–canonical introns are at least 109 years oldNature Genetics, 1996
- U1-Mediated Exon Definition Interactions Between AT-AC and GT-AG IntronsScience, 1996