The Diatom EST Database
Open Access
- 17 December 2004
- journal article
- research article
- Published by Oxford University Press (OUP) in Nucleic Acids Research
- Vol. 33 (Database ) , D344-D347
- https://doi.org/10.1093/nar/gki121
Abstract
The Diatom EST database provides integrated access to expressed sequence tag (EST) data from two eukaryotic microalgae of the class Bacillariophyceae, Phaeodactylum tricornutum and Thalassiosira pseudonana. The database currently contains sequences of close to 30 000 ESTs organized into PtDB, the P.tricornutum EST database, and TpDB, the T.pseudonana EST database. The EST sequences were clustered and assembled into a non-redundant set for each organism, and these non-redundant sequences were then subjected to automated annotation using similarity searches against protein and domain databases. EST sequences, clusters of contiguous sequences, their annotation and analysis with reference to the publicly available databases, and a codon usage table derived from a subset of sequences from PtDB and TpDB can all be accessed in the Diatom EST Database. The underlying RDBMS enables queries over the raw and annotated EST data and retrieval of information through a user-friendly web interface, with options to perform keyword and BLAST searches. The EST data can also be retrieved based on Pfam domains, Cluster of Orthologous Groups (COG) and Gene Ontologies (GO) assigned to them by similarity searches. The Database is available at http://avesthagen.sznbowler.com.Keywords
This publication has 11 references indexed in Scilit:
- The Genome of the Diatom Thalassiosira Pseudonana : Ecology, Evolution, and MetabolismScience, 2004
- The Pfam protein families databaseNucleic Acids Research, 2004
- CDD: a curated Entrez database of conserved domain alignmentsNucleic Acids Research, 2003
- Genome Properties of the Diatom Phaeodactylum tricornutumPlant Physiology, 2002
- REVEALING THE MOLECULAR SECRETS OF MARINE DIATOMSAnnual Review of Plant Biology, 2002
- Multi-query sequence BLAST output examination with MuSeqBoxBioinformatics, 2001
- The COG database: new developments in phylogenetic classification of proteins from complete genomesNucleic Acids Research, 2001
- CAP3: A DNA Sequence Assembly ProgramGenome Research, 1999
- CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choiceNucleic Acids Research, 1994
- Basic local alignment search toolJournal of Molecular Biology, 1990