General Reduction Methods for the Reliability Analysis of Distributed Computing Systems

Open Access

1 July 1993

journal article
Published by Oxford University Press (OUP) in The Computer Journal

Vol. 36 (7) , 631-644
https://doi.org/10.1093/comjnl/36.7.631

Abstract

The reliability of a distributed computing system is the probability that a distributed program which runs on multiple processing elements and needs to communicate with other processing elements for remote data files will be executed successfully. This reliability varies according to (1) the topology of the distributed computing system, (2) the reliability of the communication links, (3) the data files and program distribution among processing elements, and (4) the data files required to execute a program. Thus, the problem of analyzing the reliability of a distributed computing system is more complicated than the K-terminal reliability problem, and many of the reliability-preserving reductions for speeding up the computation of the K-terminal reliability cannot be applied to this problem. In this paper, we shall propose several reduction methods for computing the reliability of distributed computing systems. These reduction methods can dramatically reduce the size of a distributed computing systems, and therefore speed up the reliability computation.

Keywords

This publication has 0 references indexed in Scilit: