BASE
- 21 October 2001
- proceedings article
- Published by Association for Computing Machinery (ACM)
- Vol. 35 (5) , 15-28
- https://doi.org/10.1145/502034.502037
Abstract
Software errors are a major cause of outages and they are increasingly exploited in malicious attacks. Byzantine fault tolerance allows replicated systems to mask some software errors but it is expensive to deploy. This paper describes a replication technique, BASE, which uses abstraction to reduce the cost of Byzantine fault tolerance and to improve its ability to mask software errors. BASE reduces cost because it enables reuse of off-the-shelf service implementations. It improves availability because each replica can be repaired periodically using an abstract view of the state stored by correct replicas, and because each replica can run distinct or non-deterministic service implementations, which reduces the probability of common mode failures. We built an NFS service where each replica can run a different off-the-shelf file system implementation, and an object-oriented database where the replicas ran the same, non-deterministic implementation. These examples suggest that our technique can be used in practice --- in both cases, the implementation required only a modest amount of new code, and our performance results indicate that the replicated services perform comparably to the implementations that they reuse.Keywords
This publication has 13 references indexed in Scilit:
- Faulty version recovery in object-oriented N-version programmingIEE Proceedings - Software, 2000
- HACPublished by Association for Computing Machinery (ACM) ,1997
- Efficient optimistic concurrency control using loosely synchronized clocksPublished by Association for Computing Machinery (ACM) ,1995
- Hypervisor-based fault tolerancePublished by Association for Computing Machinery (ACM) ,1995
- Replication in the harp file systemPublished by Association for Computing Machinery (ACM) ,1991
- Implementing fault-tolerant services using the state machine approach: a tutorialACM Computing Surveys, 1990
- Scale and performance in a distributed file systemACM Transactions on Computer Systems, 1988
- Replicated distributed programsPublished by Association for Computing Machinery (ACM) ,1985
- Reaching Agreement in the Presence of FaultsJournal of the ACM, 1980
- Time, clocks, and the ordering of events in a distributed systemCommunications of the ACM, 1978