Reducing message logging overhead for log-based recovery
- 30 December 2002
- conference paper
- Published by Institute of Electrical and Electronics Engineers (IEEE)
- p. 1925-1928 vol.3
- https://doi.org/10.1109/iscas.1993.394126
Abstract
Checkpointing and rollback recovery is essential for long-running parallel applications. In the case of a transient fault or system crash, the affected application programs can recover from a consistent set of checkpoints saved earlier instead of restarting from the very beginning. For applications requiring transparent fault tolerance, log-based recovery can usually achieve a better recoverable state at the cost of message logging in addition to checkpointing. A simple scheme for reducing message logging overhead based on local dependency information is presented. Communication trace-driven simulation for several parallel applications is used to evaluate the benefits of the proposed scheme for real applications.Keywords
This publication has 15 references indexed in Scilit:
- Volatile logging in n-fault-tolerant distributed systemsPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2003
- Recovery in distributed systems using optimistic message logging and checkpointingJournal of Algorithms, 1990
- Fault tolerance under UNIXACM Transactions on Computer Systems, 1989
- Efficient distributed recovery using message loggingPublished by Association for Computing Machinery (ACM) ,1989
- Checkpointing and Rollback-Recovery for Distributed SystemsIEEE Transactions on Software Engineering, 1987
- Optimistic recovery in distributed systemsACM Transactions on Computer Systems, 1985
- Fail-stop processorsACM Transactions on Computer Systems, 1983
- PublishingPublished by Association for Computing Machinery (ACM) ,1983
- A message system supporting fault tolerancePublished by Association for Computing Machinery (ACM) ,1983
- System structure for software fault toleranceIEEE Transactions on Software Engineering, 1975