Automatic service availability management in asynchronous distributed systems

17 December 2002

conference paper
Published by Institute of Electrical and Electronics Engineers (IEEE)

Vol. se 11, 58-68
https://doi.org/10.1109/iwcds.1994.289935

Abstract

An availability management service is responsible for automatically ensuring that all critical services of a distributed system remain continuously available to users despite node removals and restarts caused by failures, maintenance and growth. We present an availability management service for an asynchronous distributed system characterized by unbounded communication delays and by the availability at all nodes of local, nonsynchronized timers that measure the passage of real time with some known accuracy. Examples of such systems are Unix, VMS, VM or MVS based distributed systems connected by local area networks such as Ethernet, token ring, FDDI, or channel-to-channel adapters. The presentation stresses the main ideas behind this new service, and outlines a simple design that depends upon the existence of asynchronous membership and atomic broadcast group communication services.

Keywords

This publication has 13 references indexed in Scilit:

Membership algorithms for asynchronous distributed systems
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2002
Consul: a communication substrate for fault-tolerant distributed programs
Distributed Systems Engineering, 1993
Understanding fault-tolerant distributed systems
Communications of the ACM, 1991
Using process groups to implement failure detection in asynchronous environments
Published by Association for Computing Machinery (ACM) ,1991
Broadcast protocols for distributed systems
IEEE Transactions on Parallel and Distributed Systems, 1990
An efficient reliable broadcast protocol
ACM SIGOPS Operating Systems Review, 1989
An approach to decentralized computer systems
IEEE Transactions on Software Engineering, 1986
A Rigorous Approach to Fault-Tolerant Programming
IEEE Transactions on Software Engineering, 1985
Reliable broadcast protocols
ACM Transactions on Computer Systems, 1984
A technique for software module specification with examples
Communications of the ACM, 1972