An overview of the wcd EST clustering tool
Open Access
- 14 May 2008
- journal article
- review article
- Published by Oxford University Press (OUP) in Bioinformatics
- Vol. 24 (13) , 1542-1546
- https://doi.org/10.1093/bioinformatics/btn203
Abstract
Summary: The wcd system is an open source tool for clustering expressed sequence tags (EST) and other DNA and RNA sequences. wcd allows efficient all-versus-all comparison of ESTs using either the d2 distance function or edit distance, improving existing implementations of d2. It supports merging, refinement and reclustering of clusters. It is ‘drop in’ compatible with the StackPack clustering package. wcd supports parallelization under both shared memory and cluster architectures. It is distributed with an EMBOSS wrapper allowing wcd to be installed as part of an EMBOSS installation (and so provided by a web server). Availability: wcd is distributed under a GPL licence and is available from http://code.google.com/p/wcdest Contact:scott.hazelhurst@wits.ac.za Supplementary information: Additional experimental results. The wcd manual, a companion paper describing underlying algorithms, and all datasets used for experimentation can also be found at www.bioinf.wits.ac.za/~scott/wcdsupp.htmlKeywords
This publication has 7 references indexed in Scilit:
- A hitchhiker's guide to expressed sequence tag (EST) analysisBriefings in Bioinformatics, 2006
- Integrative Annotation of 21,037 Human Genes Validated by Full-Length cDNA ClonesPLoS Biology, 2004
- Space and time efficient parallel algorithms and software for EST clusteringIEEE Transactions on Parallel and Distributed Systems, 2003
- Fast sequence clustering using a suffix array algorithmBioinformatics, 2003
- A Comprehensive Approach to Clustering of Expressed Human Gene Sequence: The Sequence Tag Alignment and Consensus Knowledge BaseGenome Research, 1999
- CAP3: A DNA Sequence Assembly ProgramGenome Research, 1999
- Biological Evaluation of d2, an Algorithm for High-Performance Sequence ComparisonJournal of Computational Biology, 1994