Resilin: Elastic MapReduce over Multiple Clouds
- 1 May 2013
- conference paper
- Published by Institute of Electrical and Electronics Engineers (IEEE)
- p. 261-268
- https://doi.org/10.1109/ccgrid.2013.48
Abstract
The MapReduce programming model offers a simple and efficient way of performing distributed computation over large data sets. To enable the usage of MapReduce in the cloud, Amazon Web Services offers Elastic MapReduce (EMR), a web service enabling users to easily run MapReduce jobs by leveraging Amazon resources (i.e. compute and storage). EMR takes care of tasks such as resource provisioning, performance tuning, and fault tolerance thus allowing the users to concentrate on the problem to be solved. However, EMR is restricted to Amazon's resources and is provided at an additional cost. In this paper, we present the design, implementation, and evaluation of Resilin, a novel EMR API-compatible system to perform distributed MapReduce computations. Resilin goes one step beyond Amazon's proprietary EMR solution and allows users (e.g. companies, scientists) to leverage resources from one or multiple public and/or private clouds. This gives Resilin users the opportunity to perform MapReduce computations over a large number of potentially geographically distributed resources. An extensive experimental evaluation conducted on multiple clusters of the Grid'5000 experimentation test bed shows that Resilin enables the use of geographically distributed resources with only limited impact on MapReduce jobs execution time.Keywords
This publication has 15 references indexed in Scilit:
- ConPaaSPublished by Association for Computing Machinery (ACM) ,2011
- Exploring MapReduce efficiency with highly-distributed dataPublished by Association for Computing Machinery (ACM) ,2011
- CumulusPublished by Association for Computing Machinery (ACM) ,2011
- Experiences using cloud computing for a scientific workflow applicationPublished by Association for Computing Machinery (ACM) ,2011
- Cloud MapReduce: A MapReduce Implementation on Top of a Cloud Operating SystemPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2011
- MapReduce in the Clouds for SciencePublished by Institute of Electrical and Electronics Engineers (IEEE) ,2010
- Multicloud Deployment of Computing Clusters for Loosely Coupled MTC ApplicationsIEEE Transactions on Parallel and Distributed Systems, 2010
- Data warehousing and analytics infrastructure at facebookPublished by Association for Computing Machinery (ACM) ,2010
- The Hadoop Distributed File SystemPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2010
- CloudBLAST: Combining MapReduce and Virtualization on Distributed Resources for Bioinformatics ApplicationsPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2008