SS-Wrapper: a package of wrapper applications for similarity searches on Linux clusters
Open Access
- 28 October 2004
- journal article
- research article
- Published by Springer Nature in BMC Bioinformatics
- Vol. 5 (1) , 171
- https://doi.org/10.1186/1471-2105-5-171
Abstract
Background: Large-scale sequence comparison is a powerful tool for biological inference in modern molecular biology. Comparing new sequences to those in annotated databases is a useful source of functional and structural information about these sequences. Using software such as the basic local alignment search tool (BLAST) or HMMPFAM to identify statistically significant matches between newly sequenced segments of genetic material and those in databases is an important task for most molecular biologists. Searching algorithms are intrinsically slow and data-intensive, especially in light of the rapid growth of biological sequence databases due to the emergence of high throughput DNA sequencing techniques. Thus, traditional bioinformatics tools are impractical on PCs and even on dedicated UNIX servers. To take advantage of larger databases and more reliable methods, high performance computation becomes necessary. Results: We describe the implementation of SS-Wrapper (Similarity Search Wrapper), a package of wrapper applications that can parallelize similarity search applications on a Linux cluster. Our wrapper utilizes a query segmentation-search (QS-search) approach to parallelize sequence database search applications. It takes into consideration load balancing between each node on the cluster to maximize resource usage. QS-search is designed to wrap many different search tools, such as BLAST and HMMPFAM using the same interface. This implementation does not alter the original program, so newly obtained programs and program updates should be accommodated easily. Benchmark experiments using QS-search to optimize BLAST and HMMPFAM showed that QS-search accelerated the performance of these programs almost linearly in proportion to the number of CPUs used. We have also implemented a wrapper that utilizes a database segmentation approach (DS-BLAST) that provides a complementary solution for BLAST searches when the database is too large to fit into the memory of a single node. Conclusions: Used together, QS-search and DS-BLAST provide a flexible solution to adapt sequential similarity searching applications in high performance computing environments. Their ease of use and their ability to wrap a variety of database search programs provide an analytical architecture to assist both the seasoned bioinformaticist and the wet-bench biologist.Keywords
This publication has 15 references indexed in Scilit:
- Parallel BLAST on split databasesBioinformatics, 2003
- Detecting distant homologs using phylogenetic tree-based HMMsProteins-Structure Function and Bioinformatics, 2003
- Wrapping up BLAST and other applications for use on Unix clustersBioinformatics, 2003
- GenBankNucleic Acids Research, 2003
- BeoBLAST: distributed BLAST and PSI-BLAST on a Beowulf clusterBioinformatics, 2002
- The Pfam Protein Families DatabaseNucleic Acids Research, 2002
- Gapped BLAST and PSI-BLAST: a new generation of protein database search programsNucleic Acids Research, 1997
- Basic local alignment search toolJournal of Molecular Biology, 1990
- [5] Rapid and sensitive sequence comparison with FASTP and FASTAPublished by Elsevier ,1990
- The viterbi algorithmProceedings of the IEEE, 1973