Replication degree customization for high availability
- 25 April 2008
- journal article
- Published by Association for Computing Machinery (ACM) in ACM SIGOPS Operating Systems Review
- Vol. 42 (4) , 55-68
- https://doi.org/10.1145/1357010.1352599
Abstract
Object replication is a common approach to enhance the availability of distributed data-intensive services and storage systems. Many such systems are known to have highly skewed object request probability distributions. In this paper, we propose an object replication degree customization scheme that maximizes the expected service availability under given object request probabilities, object sizes, and space constraints (e.g., memory/storage capacities). In particular, we discover that the optimal replication degree of an object should be linear in the logarithm of its popularity-to-size ratio. We also study the feasibility and effectiveness of our proposed scheme using applications driven by real-life system object request traces and machine failure traces. When the data object popularity distribution is known a priori, our proposed customization can achieve 1.32-2.92 "nines" increase in system availability (or 21-74% space savings at the same availability level) compared to uniform replication. Results also suggest that our scheme requires a moderate amount of replica creation/removal overhead (weekly changes involve no more than 0.24% objects and no more than 0.11% of total data size) under realistic object request popularity changes.Keywords
This publication has 15 references indexed in Scilit:
- High Availability in DHTs: Erasure Coding vs. ReplicationPublished by Springer Nature ,2005
- Clustering support and replication management for scalable network servicesIEEE Transactions on Parallel and Distributed Systems, 2003
- The Google file systemPublished by Association for Computing Machinery (ACM) ,2003
- End-to-end WAN service availabilityIEEE/ACM Transactions on Networking, 2003
- Replication strategies in unstructured peer-to-peer networksPublished by Association for Computing Machinery (ACM) ,2002
- Minimal replication cost for availabilityPublished by Association for Computing Machinery (ACM) ,2002
- A scalable content-addressable networkPublished by Association for Computing Machinery (ACM) ,2001
- ChordPublished by Association for Computing Machinery (ACM) ,2001
- Disconnected operation in the Coda File SystemACM Transactions on Computer Systems, 1992
- On Information and SufficiencyThe Annals of Mathematical Statistics, 1951