On the Fault Tolerance of Some Popular Bounded-Degree Networks
- 1 October 1998
- journal article
- Published by Society for Industrial & Applied Mathematics (SIAM) in SIAM Journal on Computing
- Vol. 27 (5) , 1303-1333
- https://doi.org/10.1137/s0097539793255163
Abstract
In this paper, we analyze the fault tolerance of several bounded-degree networks that are commonly used for parallel computation. Among other things, we show that an N-node butterfly network containing $N^{1-\epsilon}$ worst-case faults (for any constant $\epsilon 0$) can emulate a fault-free butterfly of the same size with only constant slowdown. The same result is proved for the shuffle-exchange network. Hence, these networks become the first connected bounded-degree networks known to be able to sustain more than a constant number of worst-case faults without suffering more than a constant-factor slowdown in performance. We also show that an N-node butterfly whose nodes fail with some constant probability p can emulate a fault-free network of the same type and size with a slowdown of 2O(log* N). These emulation schemes combine the technique of redundant computation with new algorithms for routing packets around faults in hypercubic networks. We also present techniques for tolerating faults that do not rely on redundant computation. These techniques tolerate fewer faults but are more widely applicable because they can be used with other networks such as binary trees and meshes of trees.
Keywords
This publication has 29 references indexed in Scilit:
- Fault-Tolerant Meshes with Small DegreeSIAM Journal on Computing, 1997
- Reconfiguring Arrays with Faults Part I: Worst-Case FaultsSIAM Journal on Computing, 1997
- On-Line Algorithms for Path Selection in a Nonblocking NetworkSIAM Journal on Computing, 1996
- Fault-tolerant meshes and hypercubes with minimal numbers of sparesIEEE Transactions on Computers, 1993
- Tolerating faults in hypercubes using subcube partitioningIEEE Transactions on Computers, 1992
- Designing fault-tolerant systems using automorphismsJournal of Parallel and Distributed Computing, 1991
- Running algorithms efficiently on faulty hypercubes (extended abstract)ACM SIGARCH Computer Architecture News, 1991
- Fault tolerance in hypercube-derivative networks (preliminary version)ACM SIGARCH Computer Architecture News, 1991
- On designing and reconfiguring k-fault-tolerant tree architecturesIEEE Transactions on Computers, 1990
- The Extra Stage Cube: A Fault-Tolerant Interconnection Network for SupersystemsIEEE Transactions on Computers, 1982