Routing in modular fault tolerant multiprocessor systems
- 2 January 2003
- proceedings article
- Published by Institute of Electrical and Electronics Engineers (IEEE)
- p. 185-193
- https://doi.org/10.1109/ftcs.1992.243601
Abstract
The authors consider a class of modular multiprocessor architectures in which spares are added to each module to cover for faulty nodes within that module, thus forming a fault tolerant basic block (FTBB). The goal is to preserve the logical adjacency between active nodes by means of a routing algorithm which delivers messages successfully to their destinations. Two phase routing strategies are introduced that route messages first to their destination FTBB, and then to the destination nodes within the destination FTBB. This strategy may be applied to a variety of architectures including binary hypercubes and 3-D tori. In the presence of f faults in these systems. It is shown that the worst case length of the message route is max( sigma +f, (K+1) sigma )+M, where sigma is the shortest path in the absence of faults, and M and K are the numbers of primary nodes and spare nodes in a FTBB, respectively. The average routing overhead is much lower than the worst case overhead.<>Keywords
This publication has 13 references indexed in Scilit:
- Message routing in HARTS with faulty componentsPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2003
- Near-optimal message routing and broadcasting in faulty hypercubesInternational Journal of Parallel Programming, 1990
- Depth-first search approach for fault-tolerant routing in hypercube multicomputersIEEE Transactions on Parallel and Distributed Systems, 1990
- Fault-tolerant array processors using single-track switchesIEEE Transactions on Computers, 1989
- Reconfiguration of VLSI/WSI mesh array processors with two-level redundancyIEEE Transactions on Computers, 1989
- Interstitial redundancy: an area efficient fault tolerance scheme for large area VLSI processor arraysIEEE Transactions on Computers, 1988
- Hypercube message routing in the presence of faultsPublished by Association for Computing Machinery (ACM) ,1988
- Communication efficient basic linear algebra computations on hypercube architecturesJournal of Parallel and Distributed Computing, 1987
- Reconfiguring a hypercube in the presence of faultsPublished by Association for Computing Machinery (ACM) ,1987
- On fault tolerant routings in general networksPublished by Association for Computing Machinery (ACM) ,1986