Sequencing and Analysis of Approximately 40 000 Soybean cDNA Clones from a Full-Length-Enriched cDNA Library
Open Access
- 16 October 2008
- journal article
- research article
- Published by Oxford University Press (OUP) in DNA Research
- Vol. 15 (6) , 333-346
- https://doi.org/10.1093/dnares/dsn024
Abstract
A large collection of full-length cDNAs is essential for the correct annotation of genomic sequences and for the functional analysis of genes and their products. We obtained a total of 39 936 soybean cDNA clones (GMFL01 and GMFL02 clone sets) in a full-length-enriched cDNA library which was constructed from soybean plants that were grown under various developmental and environmental conditions. Sequencing from 5′ and 3′ ends of the clones generated 68 661 expressed sequence tags (ESTs). The EST sequences were clustered into 22 674 scaffolds involving 2580 full-length sequences. In addition, we sequenced 4712 full-length cDNAs. After removing overlaps, we obtained 6570 new full-length sequences of soybean cDNAs so far. Our data indicated that 87.7% of the soybean cDNA clones contain complete coding sequences in addition to 5′- and 3′-untranslated regions. All of the obtained data confirmed that our collection of soybean full-length cDNAs covers a wide variety of genes. Comparative analysis between the derived sequences from soybean and Arabidopsis, rice or other legumes data revealed that some specific genes were involved in our collection and a large part of them could be annotated to unknown functions. A large set of soybean full-length cDNA clones reported in this study will serve as a useful resource for gene discovery from soybean and will also aid a precise annotation of the soybean genome.Keywords
This publication has 44 references indexed in Scilit:
- Genome Structure of the Legume, Lotus japonicusDNA Research, 2008
- Database resources of the National Center for Biotechnology InformationNucleic Acids Research, 2006
- The TIGR Plant Transcript Assemblies databaseNucleic Acids Research, 2006
- Environmental, economic, and energetic costs and benefits of biodiesel and ethanol biofuelsProceedings of the National Academy of Sciences, 2006
- Characterization of Full-length Enriched Expressed Sequence Tags of Stress-treated Poplar LeavesPlant and Cell Physiology, 2004
- The Arabidopsis Information Resource (TAIR): a model organism database providing a centralized, curated gateway to Arabidopsis biology, research materials and communityNucleic Acids Research, 2003
- Impact of genomics approaches on plant genetics and physiologyJournal of Plant Research, 2002
- Balanced-Size and Long-Size Cloning of Full-Length, Cap-Trapped cDNAs into Vectors of the Novel λ-FLC Family Allows Enhanced Gene Discovery Rate and Functional AnalysisGenomics, 2001
- Comparison of DNA Sequences with Protein SequencesGenomics, 1997
- Gapped BLAST and PSI-BLAST: a new generation of protein database search programsNucleic Acids Research, 1997