Load Redistribution Under Failure in Distributed Systems
- 1 September 1983
- journal article
- Published by Institute of Electrical and Electronics Engineers (IEEE) in IEEE Transactions on Computers
- Vol. C-32 (9) , 799-808
- https://doi.org/10.1109/tc.1983.1676329
Abstract
In order to implement a distributed system with fail-soft capabilities it is necessary to specify algorithms which redistribute the work load of a failed processor to the remaining good processors. This paper develops a general model to analyze the behavior of these algorithms in a distributed system. Such algorithms should be used with caution as they have the capability of making the entire system Unstable. By unstable we mean that if a processor fails, and its workload is redistributed, then the increased workload directed towards the rest of the system could drive one or more of the processors into overload resulting in a serious degradation of system performance. Using the general model we have studied a class of load redistribution algorithms which use various techniques to redistribute workload. These techniques include: buffering jobs arriving to the failed processor, transmitting only the jobs in the queue of the failed processor, and rerouting all jobs around the failed processor. For this class of algorithms we have derived closed form expressions for the performance of the system as a function of job arrival rate, job service rate, processor failure rate, and processor service rate. In addition, we have defined a criterion which, if adhered to, will guarantee system stability in the event of failure.Keywords
This publication has 8 references indexed in Scilit:
- Performability Evaluation of the SIFT ComputerIEEE Transactions on Computers, 1980
- A Comparative Study of Some Two-Processor OrganizationsIEEE Transactions on Computers, 1980
- Models for Dynamic Load Balancing in a Heterogeneous Multiple Processor SystemIEEE Transactions on Computers, 1979
- Performance-Related Reliability Measures for Computing SystemsIEEE Transactions on Computers, 1978
- Product Form and Local Balance in Queueing NetworksJournal of the ACM, 1977
- Open, Closed, and Mixed Networks of Queues with Different Classes of CustomersJournal of the ACM, 1975
- Approximate Analysis of General Queuing NetworksIBM Journal of Research and Development, 1975
- Application of the Diffusion Approximation to Queueing Networks I: Equilibrium Queue DistributionsJournal of the ACM, 1974