A graph based algorithm for generating EST consensus sequences
Open Access
- 30 November 2004
- journal article
- research article
- Published by Oxford University Press (OUP) in Bioinformatics
- Vol. 21 (8) , 1371-1375
- https://doi.org/10.1093/bioinformatics/bti184
Abstract
Motivation: EST sequences constitute an abundant, yet error prone resource for computational biology. Expressed sequences are important in gene discovery and identification, and they are also crucial for the discovery and classification of alternative splicing. An important challenge when processing EST sequences is the reconstruction of mRNA by assembling EST clusters into consensus sequences. Results: In contrast to the more established assembly tools, we propose an algorithm that constructs a graph over sequence fragments of fixed size, and produces consensus sequences as traversals of this graph. We provide a tool implementing this algorithm, and perform an experiment where the consensus sequences produced by our implementation, as well as by currently available tools, are compared to mRNA. The results show that our proposed algorithm in a majority of the cases produces consensus of higher quality than the established sequence assemblers and at a competitive speed. Availability: The source code for the implementation is available under a GPL license from http://www.ii.uib.no/~ketil/bioinformatics/ Contact:ketil@ii.uib.noKeywords
This publication has 17 references indexed in Scilit:
- Using the miraEST Assembler for Reliable and Automated mRNA Transcript Assembly and SNP Detection in Sequenced ESTsGenome Research, 2004
- Fast sequence clustering using a suffix array algorithmBioinformatics, 2003
- Whole-Genome Sequence Assembly for Mammalian Genomes: Arachne 2Genome Research, 2003
- Selecting targets for therapeutic validation through differential protein expression using chromatography-mass spectrometryBioinformatics, 2002
- Clustering of highly homologous sequences to reduce the size of large protein databasesBioinformatics, 2001
- An optimized protocol for analysis of EST sequencesNucleic Acids Research, 2000
- Frequent Alternative Splicing of Human GenesGenome Research, 1999
- CAP3: A DNA Sequence Assembly ProgramGenome Research, 1999
- ESTablishing a human transcript mapNature Genetics, 1995
- A New Algorithm for DNA Sequence AssemblyJournal of Computational Biology, 1995