Automatic service availability management in asynchronous distributed systems
- 17 December 2002
- conference paper
- Published by Institute of Electrical and Electronics Engineers (IEEE)
- Vol. se 11, 58-68
- https://doi.org/10.1109/iwcds.1994.289935
Abstract
An availability management service is responsible for automatically ensuring that all critical services of a distributed system remain continuously available to users despite node removals and restarts caused by failures, maintenance and growth. We present an availability management service for an asynchronous distributed system characterized by unbounded communication delays and by the availability at all nodes of local, nonsynchronized timers that measure the passage of real time with some known accuracy. Examples of such systems are Unix, VMS, VM or MVS based distributed systems connected by local area networks such as Ethernet, token ring, FDDI, or channel-to-channel adapters. The presentation stresses the main ideas behind this new service, and outlines a simple design that depends upon the existence of asynchronous membership and atomic broadcast group communication services.Keywords
This publication has 13 references indexed in Scilit:
- Membership algorithms for asynchronous distributed systemsPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2002
- Consul: a communication substrate for fault-tolerant distributed programsDistributed Systems Engineering, 1993
- Understanding fault-tolerant distributed systemsCommunications of the ACM, 1991
- Using process groups to implement failure detection in asynchronous environmentsPublished by Association for Computing Machinery (ACM) ,1991
- Broadcast protocols for distributed systemsIEEE Transactions on Parallel and Distributed Systems, 1990
- An efficient reliable broadcast protocolACM SIGOPS Operating Systems Review, 1989
- An approach to decentralized computer systemsIEEE Transactions on Software Engineering, 1986
- A Rigorous Approach to Fault-Tolerant ProgrammingIEEE Transactions on Software Engineering, 1985
- Reliable broadcast protocolsACM Transactions on Computer Systems, 1984
- A technique for software module specification with examplesCommunications of the ACM, 1972