CloudBLAST: Combining MapReduce and Virtualization on Distributed Resources for Bioinformatics Applications
Top Cited Papers
- 1 December 2008
- conference paper
- Published by Institute of Electrical and Electronics Engineers (IEEE)
- p. 222-229
- https://doi.org/10.1109/escience.2008.62
Abstract
This paper proposes and evaluates an approach to the parallelization, deployment and management of bioinformatics applications that integrates several emerging technologies for distributed computing. The proposed approach uses the MapReduce paradigm to parallelize tools and manage their execution, machine virtualization to encapsulate their execution environments and commonly used data sets into flexibly deployable virtual machines, and network virtualization to connect resources behind firewalls/NATs while preserving the necessary performance and the communication environment. An implementation of this approach is described and used to demonstrate and evaluate the proposed approach. The implementation integrates Hadoop, Virtual Workspaces, and ViNe as the MapReduce, virtual machine and virtual network technologies, respectively, to deploy the commonly used bioinformatics tool NCBI BLAST on a WAN-based test bed consisting of clusters at two distinct locations, the University of Florida and the University of Chicago. This WAN-based implementation, called CloudBLAST, was evaluated against both non-virtualized and LAN-based implementations in order to assess the overheads of machine and network virtualization, which were shown to be insignificant. To compare the proposed approach against an MPI-based solution, CloudBLAST performance was experimentally contrasted against the publicly available mpiBLAST on the same WAN-based test bed. Both versions demonstrated performance gains as the number of available processors increased, with CloudBLAST delivering speedups of 57 against 52.4 of MPI version, when 64 processors on 2 sites were used. The results encourage the use of the proposed approach for the execution of large-scale bioinformatics applications on emerging distributed environments that provide access to computing resources as a service.Keywords
This publication has 13 references indexed in Scilit:
- Applications of Grid Computing in Genetics and ProteomicsPublished by Springer Nature ,2007
- SmartsocketsPublished by Association for Computing Machinery (ACM) ,2007
- WOW: Self-Organizing Wide Area Overlay Networks of Virtual WorkstationsPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2006
- ProActive: an integrated platform for programming and running applications on Grids and P2P systemsComputational Methods in Science and Technology, 2006
- GridBLAST: a Globus-based high-throughput implementation of BLAST in a Grid computing frameworkConcurrency and Computation: Practice and Experience, 2005
- Wide-area communication for grids: an integrated solution to connectivity, performance and security problemsPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2004
- A Greedy Algorithm for Aligning DNA SequencesJournal of Computational Biology, 2000
- A Framework for IP Based Virtual Private NetworksPublished by RFC Editor ,2000
- Biological Sequence AnalysisPublished by Cambridge University Press (CUP) ,1998
- Basic Local Alignment Search ToolJournal of Molecular Biology, 1990