Active replication of multithreaded applications
- 3 April 2006
- journal article
- Published by Institute of Electrical and Electronics Engineers (IEEE) in IEEE Transactions on Parallel and Distributed Systems
- Vol. 17 (5) , 448-465
- https://doi.org/10.1109/tpds.2006.56
Abstract
Software-based active replication is expensive in terms of performance overhead. Multithreading can help improve performance; however, thread scheduling is a source of nondeterminism in replica behavior. To achieve strong replica consistency in multithreaded environments, this paper proposes intercepting mutex lock/unlock operations performed by threads on accessing the shared data and contributes with two algorithmic solutions: 1) a loose synchronization algorithm (LSA), which captures the natural concurrency in a leader replica and projects it on follower replicas through interreplica communication, and 2) a preemptive deterministic scheduler (PDS) algorithm, which removes the need for interreplica communication through the notion of round and by suspending threads when it is unable (yet) to schedule them deterministically. Failure behavior and performance of LSA and PDS implementations are evaluated in a triplicated system and compared with existing solutions. A performance evaluation indicates that LSA and PDS outperform existing solutions, with PDS offering lower throughput than LSA. A fault-injection campaign shows that PDS is more robust to errors due to the absence of interreplica communication. Hence, LSA and PDS represent a trade-off between performance and dependability. Finally, LSA and PDS are demonstrated in replicating the Apache Web server, a substantial real-world application.Keywords
This publication has 21 references indexed in Scilit:
- A preemptive deterministic scheduling algorithm for multithreaded replicasPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2004
- Loose synchronization of multithreaded replicasPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2003
- On microprocessor error behavior modelingPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2002
- FATOMAS-a fault-tolerant mobile agent system based on the agent-dependent approachPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2002
- NFTAPE: a framework for assessing dependability in distributed systems with lightweight fault injectorsPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2002
- System support for object groupsACM SIGPLAN Notices, 1998
- Consistent object replication in the Eternal systemTheory and Practice of Object Systems, 1998
- An overview of the Arjuna distributed programming systemIEEE Software, 1991
- Implementing fault-tolerant services using the state machine approach: a tutorialACM Computing Surveys, 1990
- Time, clocks, and the ordering of events in a distributed systemCommunications of the ACM, 1978