EST assembly supported by a draft genome sequence: an analysis of the Chlamydomonas reinhardtii transcriptome
Open Access
- 1 March 2007
- journal article
- research article
- Published by Oxford University Press (OUP) in Nucleic Acids Research
- Vol. 35 (6) , 2074-2083
- https://doi.org/10.1093/nar/gkm081
Abstract
Clustering and assembly of expressed sequence tags (ESTs) constitute the basis for most genomewide descriptions of a transcriptome. This approach is limited by the decline in sequence quality toward the end of each EST, impacting both sequence clustering and assembly. Here, we exploit the available draft genome sequence of the unicellular green alga Chlamydomonas reinhardtii to guide clustering and to correct errors in the ESTs. We have grouped all available EST and cDNA sequences into 12 063 ACEGs (assembly of contiguous ESTs based on genome) and generated 15 857 contigs of average length 934 nt. We predict that roughly 3000 of our contigs represent full-length transcripts. Compared to previous assemblies, ACEGs show extended contig length, increased accuracy and a reduction in redundancy. Because our assembly protocol also uses ESTs with no corresponding genomic sequences, it provides sequence information for genes interrupted by sequence gaps. Detailed analysis of randomly sampled ACEGs reveals several hundred putative cases of alternative splicing, many overlapping transcription units and new genes not identified by gene prediction algorithms. Our protocol, although developed for and tailored to the C. reinhardtii dataset, can be exploited by any eukaryotic genome project for which both a draft genome sequence and ESTs are available.Keywords
This publication has 21 references indexed in Scilit:
- Comparative genomics as a tool for gene discoveryCurrent Opinion in Biotechnology, 2006
- A global assembly of cotton ESTsGenome Research, 2006
- The FLP proteins act as regulators of chlorophyll synthesis in response to light and plastid signals in ChlamydomonasGenes & Development, 2005
- Chloroplast Elongation Factor Ts Pro-Protein Is an Evolutionarily Conserved Fusion with the S1 Domain-Containing Plastid-Specific Ribosomal Protein-7Plant Cell, 2004
- Establishment of publicly available cDNA material and information resource of Chlamydomonas reinhardtii (Chlorophyta) to facilitate gene function analysisPhycologia, 2004
- Chlamydomonas reinhardtii at the Crossroads of GenomicsEukaryotic Cell, 2003
- Chlamydomonas reinhardtii Genome Project. A Guide to the Generation and Use of the cDNA InformationPlant Physiology, 2003
- BLAT—The BLAST-Like Alignment ToolGenome Research, 2002
- The Chloroplastic GrpE Homolog of Chlamydomonas: Two Isoforms Generated by Differential SplicingPlant Cell, 2001
- DNA sequence quality trimming and vector removalBioinformatics, 2001