Data replication strategies in grid environments

26 June 2003

conference paper
Published by Institute of Electrical and Electronics Engineers (IEEE)

p. 378-383
https://doi.org/10.1109/icapp.2002.1173605

Abstract

Data grids provide geographically distributed resources for large-scale data-intensive applications that generate large data sets. However, ensuring efficient and fast access to such huge and widely distributed data is hindered by the high latencies of the Internet. To address these problems we introduce a set of replication management services and protocols that offer high data availability, low bandwidth consumption, increased fault tolerance, and improved scalability of the overall system. Replication decisions are made based on a cost model that evaluates data access costs and performance gains of creating each replica. The estimation of costs and gains is based on factors such as run-time accumulated read/write statistics, response time, bandwidth, and replica size. To address scalability, replicas are organized in a combination of hierarchical and flat topologies that represent propagation graphs that minimize inter-replica communication costs. To evaluate our model we use the network simulator NS to study the impact of replication. Our results prove that replication improves the performance of data access on the data grid, and that the gain increases with the size of data used.

Keywords

This publication has 11 references indexed in Scilit:

File and object replication in data grids
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2002
Replica selection in the Globus Data Grid
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2002
The Grid: A New Infrastructure for 21st Century Science
Physics Today, 2002
The Anatomy of the Grid: Enabling Scalable Virtual Organizations
The International Journal of High Performance Computing Applications, 2001
Optimistic Replication for Internet Data Services
Published by Springer Nature ,2000
Javanaise: distributed shared objects for Internet cooperative applications
Published by Springer Nature ,1998
From the I-WAY to the National Technology Grid
Communications of the ACM, 1997
Coda: a highly available file system for a distributed workstation environment
IEEE Transactions on Computers, 1990
Fat-trees: Universal networks for hardware-efficient supercomputing
IEEE Transactions on Computers, 1985
LOCUS a network transparent, high reliability distributed system
Published by Association for Computing Machinery (ACM) ,1981