LongSAGE analysis significantly improves genome annotation: identifications of novel genes and alternative transcripts in the mouse
Open Access
- 10 December 2004
- journal article
- research article
- Published by Oxford University Press (OUP) in Bioinformatics
- Vol. 21 (8) , 1393-1400
- https://doi.org/10.1093/bioinformatics/bti207
Abstract
Motivation: Owing to its increased tag length, LongSAGE tags are expected to be more reliable in direct assignment to genome sequences. Therefore, we evaluated the use of LongSAGE data in genome annotation by using our LongSAGE dataset of 202 015 tags (consisting of 41 718 unique tags), experimentally generated from mouse embryonic tail libraries. Results: A fraction of LongSAGE tags could not be unambiguously assigned to its gene, due to the presence of widely conserved sequences downstream of particular CATG anchor sites. The presence of alternative forms of transcripts was confirmed in 45% of all detected genes. Surprisingly, a large fraction of LongSAGE tags with hits to the genome (66%) could not be assigned to any gene annotated in EnsEMBL. Among such cases, 2098 LongSAGE tags fell into a region containing a putative gene predicted by GenScan, providing experimental evidence for the presence of real genes, while 9112 genes were found out to be left out or wrongly annotated by the EnsEMBL pipeline. Conclusions: LongSAGE transcriptome data can significantly improve the genome annotation by identifying novel genes and alternative transcripts, even in the case of thus far best-characterized organisms like the mouse. Contact:imai@gsf.deKeywords
This publication has 32 references indexed in Scilit:
- GeneWise and GenomewiseGenome Research, 2004
- The Ensembl Automatic Gene Annotation SystemGenome Research, 2004
- An Overview of EnsemblGenome Research, 2004
- Correction of sequence-based artifacts in serial analysis of gene expressionBioinformatics, 2004
- The Mouse Genome Database (MGD): integrating biology with the genomeNucleic Acids Research, 2004
- SLAM: Cross-Species Gene Finding and Alignment with a Generalized Pair Hidden Markov ModelGenome Research, 2003
- Alternative splicing and genome complexityNature Genetics, 2001
- Identification of Alternate Polyadenylation Sites and Analysis of their Tissue Distribution Using EST DataGenome Research, 2001
- Prediction of complete gene structures in human genomic DNAJournal of Molecular Biology, 1997
- Basic Local Alignment Search ToolJournal of Molecular Biology, 1990