Design and evaluation of hardware strategies for reconfiguring hypercubes and meshes under faults
- 1 July 1994
- journal article
- Published by Institute of Electrical and Electronics Engineers (IEEE) in IEEE Transactions on Computers
- Vol. 43 (7) , 841-848
- https://doi.org/10.1109/12.293264
Abstract
This paper discusses the design of two reconfiguration strategies for distributed memory multicomputer architectures under failures. The specific architectures to which we apply the techniques are hypercubes and meshes. The first scheme uses spare processors attached to certain processors in the hypercube or mash using a novel embedding technique. The second approach places spare processors along specific links in the hypercube or mesh. Both schemes involve the mapping of logical links of a virtual machine onto a set of physical links in the final reconfigured machine and hence suffer some performance degradation. We characterize the performance degradation through trace-driven simulation of real applications running on the faulty and reconfigured system. We find that the schemes have high reliability, suffer little degradation in performance, and are very low in cost.Keywords
This publication has 18 references indexed in Scilit:
- PACE2: an improved parallel VLSI extractor with parameter extractionPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2003
- Strategies for reconfiguring hypercubes under faultsPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2002
- Distributed algorithms for shortest-path, deadlock-free routing and broadcasting in arbitrarily faulty hypercubesPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2002
- Performance measurement and trace driven simulation of parallel CAD and numeric applications on a hypercube multicomputerIEEE Transactions on Parallel and Distributed Systems, 1992
- Depth-first search approach for fault-tolerant routing in hypercube multicomputersIEEE Transactions on Parallel and Distributed Systems, 1990
- A parallel row-based algorithm for standard cell placement with integrated error controlPublished by Association for Computing Machinery (ACM) ,1989
- An optimal shortest-path routing policy for network computers with regular mesh-connected topologiesIEEE Transactions on Computers, 1989
- An evaluation of system-level fault tolerance on the Intel hypercube multiprocessorPublished by Institute of Electrical and Electronics Engineers (IEEE) ,1988
- Distributing resources in hypercube computersPublished by Association for Computing Machinery (ACM) ,1988
- Hypercube message routing in the presence of faultsPublished by Association for Computing Machinery (ACM) ,1988