Annotated Expressed Sequence Tags (ESTs) from pre-smolt Atlantic salmon (Salmo salar) in a searchable data resource
Open Access
- 2 July 2007
- journal article
- Published by Springer Nature in BMC Genomics
- Vol. 8 (1) , 209
- https://doi.org/10.1186/1471-2164-8-209
Abstract
To identify as many different transcripts/genes in the Atlantic salmon genome as possible, it is crucial to acquire good cDNA libraries from different tissues and developmental stages, their relevant sequences (ESTs or full length sequences) and attempt to predict function. Such libraries allow identification of a large number of different transcripts and can provide valuable information on genes expressed in a particular tissue at a specific developmental stage. This data is important in constructing a microarray chip, identifying SNPs in coding regions, and for future identification of genes in the whole genome sequence. An important factor that determines the usefulness of generated data for biologists is efficient data access. Public searchable databases play a crucial role in providing such service. Twenty-three Atlantic salmon cDNA libraries were constructed from 15 tissues, yielding nearly 155,000 clones. From these libraries 58,109 ESTs were generated, of which 57,212 were used for contig assembly. Following deletion of mitochondrial sequences 55,118 EST sequences were submitted to GenBank. In all, 20,019 unique sequences, consisting of 6,424 contigs and 13,595 singlets, were generated. The Norwegian Salmon Genome Project Database has been constructed and annotation performed by the annotation transfer approach. Annotation was successful for 50.3% (10,075) of the sequences and 6,113 sequences (30.5%) were annotated with Gene Ontology terms for molecular function, biological process and cellular component. We describe the construction of cDNA libraries from juvenile/pre-smolt Atlantic salmon (Salmo salar), EST sequencing, clustering, and annotation by assigning putative function to the transcripts. These sequences represents 97% of all sequences submitted to GenBank from the pre-smoltification stage. The data has been grouped into datasets according to its source and type of annotation. Various data query options are offered including searches on function assignments and Gene Ontology terms. Data delivery options include summaries for the datasets and their annotations, detailed self-explanatory annotations, and access to the original BLAST results and Gene Ontology annotation trees. Potential presence of a relatively high number of immune-related genes in the dataset was shown by annotation searches.Keywords
This publication has 22 references indexed in Scilit:
- An extensive resource of single nucleotide polymorphism markers associated with Atlantic salmon (Salmo salar) expressed sequencesAquaculture, 2007
- A physical map of the genome of Atlantic salmon, Salmo salarGenomics, 2005
- A linkage map of Atlantic salmon (Salmo salar) reveals an uncommonly large difference in recombination rate between the sexesAnimal Genetics, 2004
- Development and Application of a Salmonid EST Database and cDNA Microarray: Data Mining and Interspecific Hybridization CharacteristicsGenome Research, 2004
- A microsatellite linkage map for Atlantic salmon (Salmo salar)Animal Genetics, 2004
- SWISS-MODEL: an automated protein homology-modeling serverNucleic Acids Research, 2003
- Getting the most from PSI–BLASTPublished by Elsevier ,2002
- Characterization of microsatellite and minisatellite loci in Atlantic salmon (Salmo salar L.) and cross‐species amplification in other salmonidsMolecular Ecology, 2000
- Gapped BLAST and PSI-BLAST: a new generation of protein database search programsNucleic Acids Research, 1997
- dbEST — database for “expressed sequence tags”Nature Genetics, 1993