Cheap recovery

1 February 2005

journal article
Published by Association for Computing Machinery (ACM) in ACM Transactions on Storage

Vol. 1 (1) , 38-70
https://doi.org/10.1145/1044956.1044959

Abstract

Cluster hash tables (CHTs) are key components of many large-scale Internet services due to their highly-scalable performance and the prevalence of the type of data they store. Another advantage of CHTs is that they can be designed to be as self-managing as a cluster of stateless servers. One key to achieving this extreme manageability is reboot-based recovery that is predictably fast and has modest impact on system performance and availability. This "cheap" recovery mechanism simplifies management in two ways. First, it simplifies failure detection by lowering the cost of acting on false positives. This enables one to use statistical techniques to turn hard-to-catch failures, such as node degradation, into failure, followed by recovery. Second, cheap recovery simplifies capacity planning by recasting repartitioning as failure plus recovery to achieve zero-downtime incremental scaling. These low-cost recovery and scaling mechanisms make it possible for the system to be continuously self-adjusting, a key property of self-managing systems.

Keywords

This publication has 9 references indexed in Scilit:

Self-*Storage: Brick-based storage with automated administration
Published by Defense Technical Information Center (DTIC) ,2003
A Conversation with Jim Gray
Queue, 2003
The Ninja architecture for robust Internet-scale systems and services
Computer Networks, 2001
Lessons from giant-scale services
IEEE Internet Computing, 2001
RAID: high-performance, reliable secondary storage
ACM Computing Surveys, 1994
Disconnected operation in the Coda File System
ACM Transactions on Computer Systems, 1992
Correct memory operation of cache-based multiprocessors
Published by Association for Computing Machinery (ACM) ,1987
Consistency in a partitioned network: a survey
ACM Computing Surveys, 1985
A Majority consensus approach to concurrency control for multiple copy databases
ACM Transactions on Database Systems, 1979