Data replication strategies in grid environments
- 26 June 2003
- conference paper
- Published by Institute of Electrical and Electronics Engineers (IEEE)
Abstract
Data grids provide geographically distributed resources for large-scale data-intensive applications that generate large data sets. However, ensuring efficient and fast access to such huge and widely distributed data is hindered by the high latencies of the Internet. To address these problems we introduce a set of replication management services and protocols that offer high data availability, low bandwidth consumption, increased fault tolerance, and improved scalability of the overall system. Replication decisions are made based on a cost model that evaluates data access costs and performance gains of creating each replica. The estimation of costs and gains is based on factors such as run-time accumulated read/write statistics, response time, bandwidth, and replica size. To address scalability, replicas are organized in a combination of hierarchical and flat topologies that represent propagation graphs that minimize inter-replica communication costs. To evaluate our model we use the network simulator NS to study the impact of replication. Our results prove that replication improves the performance of data access on the data grid, and that the gain increases with the size of data used.Keywords
This publication has 11 references indexed in Scilit:
- File and object replication in data gridsPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2002
- Replica selection in the Globus Data GridPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2002
- The Grid: A New Infrastructure for 21st Century SciencePhysics Today, 2002
- The Anatomy of the Grid: Enabling Scalable Virtual OrganizationsThe International Journal of High Performance Computing Applications, 2001
- Optimistic Replication for Internet Data ServicesPublished by Springer Nature ,2000
- Javanaise: distributed shared objects for Internet cooperative applicationsPublished by Springer Nature ,1998
- From the I-WAY to the National Technology GridCommunications of the ACM, 1997
- Coda: a highly available file system for a distributed workstation environmentIEEE Transactions on Computers, 1990
- Fat-trees: Universal networks for hardware-efficient supercomputingIEEE Transactions on Computers, 1985
- LOCUS a network transparent, high reliability distributed systemPublished by Association for Computing Machinery (ACM) ,1981