Design and evaluation of hardware strategies for reconfiguring hypercubes and meshes under faults

1 July 1994

journal article
Published by Institute of Electrical and Electronics Engineers (IEEE) in IEEE Transactions on Computers

Vol. 43 (7) , 841-848
https://doi.org/10.1109/12.293264

Abstract

This paper discusses the design of two reconfiguration strategies for distributed memory multicomputer architectures under failures. The specific architectures to which we apply the techniques are hypercubes and meshes. The first scheme uses spare processors attached to certain processors in the hypercube or mash using a novel embedding technique. The second approach places spare processors along specific links in the hypercube or mesh. Both schemes involve the mapping of logical links of a virtual machine onto a set of physical links in the final reconfigured machine and hence suffer some performance degradation. We characterize the performance degradation through trace-driven simulation of real applications running on the faulty and reconfigured system. We find that the schemes have high reliability, suffer little degradation in performance, and are very low in cost.

Keywords

This publication has 18 references indexed in Scilit:

PACE2: an improved parallel VLSI extractor with parameter extraction
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2003
Strategies for reconfiguring hypercubes under faults
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2002
Distributed algorithms for shortest-path, deadlock-free routing and broadcasting in arbitrarily faulty hypercubes
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2002
Performance measurement and trace driven simulation of parallel CAD and numeric applications on a hypercube multicomputer
IEEE Transactions on Parallel and Distributed Systems, 1992
Depth-first search approach for fault-tolerant routing in hypercube multicomputers
IEEE Transactions on Parallel and Distributed Systems, 1990
A parallel row-based algorithm for standard cell placement with integrated error control
Published by Association for Computing Machinery (ACM) ,1989
An optimal shortest-path routing policy for network computers with regular mesh-connected topologies
IEEE Transactions on Computers, 1989
An evaluation of system-level fault tolerance on the Intel hypercube multiprocessor
Published by Institute of Electrical and Electronics Engineers (IEEE) ,1988
Distributing resources in hypercube computers
Published by Association for Computing Machinery (ACM) ,1988
Hypercube message routing in the presence of faults
Published by Association for Computing Machinery (ACM) ,1988