Distributed checkpointing based on influential messages
- 23 December 2002
- conference paper
- Published by Institute of Electrical and Electronics Engineers (IEEE)
Abstract
In distributed applications, a group of multiple objects are cooperated to achieve some objectives. The computation on the objects are based on the massage passing, i.e. remote procedure call. The objects may suffer from different kinds of faults. In the presence of the object faults, the states of the objects in the system have to be kept consistent. If some object o is faulty, o is rolled back to the checkpoint and objects which have received messages from o are also required to be rolled back. In this paper, we define influential messages whose receivers are required to be rolled back from the application point of view if the senders are rolled back on the basis of the message semantics. By using the influential messages, we would like to define a significant checkpoint which denotes a consistent global state of the system but might be inconsistent from the traditional definition. We would like to present protocols for taking the significant checkpoint and for rolling back the objects by using the influential messages.Keywords
This publication has 16 references indexed in Scilit:
- Independent checkpointing and concurrent rollback for recovery in distributed systems-an optimistic approachPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2003
- Checkpointing and rollback recovery in a distributed system using common time basePublished by Institute of Electrical and Electronics Engineers (IEEE) ,2003
- Priority-based total and semi-total ordering broadcast protocolsPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2003
- Causally ordering broadcast protocolPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2002
- Reliable broadcast protocol for selectively partially ordering PDUs (SPO protocol)Published by Institute of Electrical and Electronics Engineers (IEEE) ,2002
- Reliable communication in the presence of failuresACM Transactions on Computer Systems, 1987
- Checkpointing and Rollback-Recovery for Distributed SystemsIEEE Transactions on Software Engineering, 1987
- Distributed snapshotsACM Transactions on Computer Systems, 1985
- Global States of a Distributed SystemIEEE Transactions on Software Engineering, 1982
- Time, clocks, and the ordering of events in a distributed systemCommunications of the ACM, 1978