prot4EST: Translating Expressed Sequence Tags from neglected genomes
Open Access
- 30 November 2004
- journal article
- research article
- Published by Springer Nature in BMC Bioinformatics
- Vol. 5 (1) , 187
- https://doi.org/10.1186/1471-2105-5-187
Abstract
Background: The genomes of an increasing number of species are being investigated through generation of expressed sequence tags (ESTs). However, ESTs are prone to sequencing errors and typically define incomplete transcripts, making downstream annotation difficult. Annotation would be greatly improved with robust polypeptide translations. Many current solutions for EST translation require a large number of full-length gene sequences for training purposes, a resource that is not available for the majority of EST projects. Results: As part of our ongoing EST programs investigating these "neglected" genomes, we have developed a polypeptide prediction pipeline, prot4EST. It incorporates freely available software to produce final translations that are more accurate than those derived from any single method. We show that this integrated approach goes a long way to overcoming the deficit in training data. Conclusions: prot4EST provides a portable EST translation solution and can be usefully applied to >95% of EST projects to improve downstream annotation. It is freely available from http://www.nematodes.org/PartiGene.Keywords
This publication has 37 references indexed in Scilit:
- The Pfam protein families databaseNucleic Acids Research, 2004
- The Genome Sequence of Caenorhabditis briggsae: A Platform for Comparative GenomicsPLoS Biology, 2003
- Redundancy based detection of sequence polymorphisms in expressed sequence tag data using autoSNPBioinformatics, 2003
- The SWISS-PROT protein knowledgebase and its supplement TrEMBL in 2003Nucleic Acids Research, 2003
- Structural Characterization of the Human ProteomeGenome Research, 2002
- The Bioperl Toolkit: Perl Modules for the Life SciencesGenome Research, 2002
- Genome sequencing: time to widen our horizonsBriefings in Functional Genomics and Proteomics, 2002
- Comparison of DNA Sequences with Protein SequencesGenomics, 1997
- Gapped BLAST and PSI-BLAST: a new generation of protein database search programsNucleic Acids Research, 1997
- Prediction of complete gene structures in human genomic DNAJournal of Molecular Biology, 1997