High-throughput gene and SNP discovery in Eucalyptus grandis, an uncharacterized genome
Top Cited Papers
Open Access
- 30 June 2008
- journal article
- Published by Springer Nature in BMC Genomics
- Vol. 9 (1) , 312
- https://doi.org/10.1186/1471-2164-9-312
Abstract
Background: Benefits from high-throughput sequencing using 454 pyrosequencing technology may be most apparent for species with high societal or economic value but few genomic resources. Rapid means of gene sequence and SNP discovery using this novel sequencing technology provide a set of baseline tools for genome-level research. However, it is questionable how effective the sequencing of large numbers of short reads for species with essentially no prior gene sequence information will support contig assemblies and sequence annotation. Results: With the purpose of generating the first broad survey of gene sequences in Eucalyptus grandis, the most widely planted hardwood tree species, we used 454 technology to sequence and assemble 148 Mbp of expressed sequences (EST). EST sequences were generated from a normalized cDNA pool comprised of multiple tissues and genotypes, promoting discovery of homologues to almost half of Arabidopsis genes, and a comprehensive survey of allelic variation in the transcriptome. By aligning the sequencing reads from multiple genotypes we detected 23,742 SNPs, 83% of which were validated in a sample. Genome-wide nucleotide diversity was estimated for 2,392 contigs using a modified theta (θ) parameter, adapted for measuring genetic diversity from polymorphisms detected by randomly sequencing a multi-genotype cDNA pool. Diversity estimates in non-synonymous nucleotides were on average 4x smaller than in synonymous, suggesting purifying selection. Non-synonymous to synonymous substitutions (Ka/Ks) among 2,001 contigs averaged 0.30 and was skewed to the right, further supporting that most genes are under purifying selection. Comparison of these estimates among contigs identified major functional classes of genes under purifying and diversifying selection in agreement with previous researches. Conclusion: In providing an abundance of foundational transcript sequences where limited prior genomic information existed, this work created part of the foundation for the annotation of the E. grandis genome that is being sequenced by the US Department of Energy. In addition we demonstrated that SNPs sampled in large-scale with 454 pyrosequencing can be used to detect evolutionary signatures among genes, providing one of the first genome-wide assessments of nucleotide diversity and Ka/Ks for a non-model plant species.Keywords
This publication has 39 references indexed in Scilit:
- Using plastid genome-scale data to resolve enigmatic relationships among basal angiospermsProceedings of the National Academy of Sciences, 2007
- Sequence variation within the rRNA gene loci of 12 Drosophila speciesGenome Research, 2007
- Genome-Wide Expression Profiling of the Arabidopsis Female Gametophyte Identifies Families of Small, Secreted ProteinsPLoS Genetics, 2007
- A pyrosequencing-tailored nucleotide barcode design unveils opportunities for large-scale sample multiplexingNucleic Acids Research, 2007
- Global gene expression analysis of the shoot apical meristem of maize (Zea mays L.)The Plant Journal, 2007
- Targeted high-throughput sequencing of tagged nucleic acid samplesNucleic Acids Research, 2007
- SNP discovery via 454 transcriptome sequencingThe Plant Journal, 2007
- Sampling the Arabidopsis Transcriptome with Massively Parallel PyrosequencingPlant Physiology, 2007
- Multilocus Patterns of Nucleotide Diversity, Linkage Disequilibrium and Demographic History of Norway Spruce [Picea abies (L.) Karst]Genetics, 2006
- Gene discovery and annotation using LCM-454 transcriptome sequencingGenome Research, 2006