Deterministic Replay for Transparent Recovery in Component-Oriented Middleware
- 1 June 2009
- conference paper
- Published by Institute of Electrical and Electronics Engineers (IEEE)
- p. 615-622
- https://doi.org/10.1109/icdcs.2009.79
Abstract
We present and evaluate a low-overhead approach for achieving high-availability in distributed event-processing middleware systems consisting of networks of stateful software components that communicate by either one-way (send) or two-way (call) messages. The approach is based on transparently augmenting each component to produce a deterministic component whose state can be recovered by checkpoint and replay. Determinism is achieved by augmenting messages with virtual times, and by scheduling message handling in virtual time order. Scheduling delays are reduced by computing virtual times with estimators: deterministic functions that approximate the expected real times of arrival. We describe our algorithms, show how Java components can be transparently augmented with checkpointing code and with good estimators, discuss how our deterministic runtime can be tuned to reduce overhead, and provide experimental results to measure the overhead of determinism relative to non-determinism.Keywords
This publication has 9 references indexed in Scilit:
- Real-Time Distributed Discrete-Event Execution with Fault TolerancePublished by Institute of Electrical and Electronics Engineers (IEEE) ,2008
- Active replication of multithreaded applicationsIEEE Transactions on Parallel and Distributed Systems, 2006
- Fault-tolerance in the Borealis distributed stream processing systemPublished by Association for Computing Machinery (ACM) ,2005
- A real-time garbage collector with low overhead and consistent utilizationPublished by Association for Computing Machinery (ACM) ,2003
- GuavaPublished by Association for Computing Machinery (ACM) ,2000
- Efficient atomic broadcast using deterministic mergePublished by Association for Computing Machinery (ACM) ,2000
- Deterministic scheduling for transactional multithreaded replicasPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2000
- Optimistic recovery in distributed systemsACM Transactions on Computer Systems, 1985
- Virtual timeACM Transactions on Programming Languages and Systems, 1985