Fault tolerance in highly parallel hardware systems
- 1 February 1994
- journal article
- Published by Institute of Electrical and Electronics Engineers (IEEE) in IEEE Micro
- Vol. 14 (1) , 60-68
- https://doi.org/10.1109/40.259902
Abstract
As the demand for highly parallel systems grows, the vast amount of concurrently operating hardware involved can make it difficult to guarantee proper system behavior. Problems arise both from permanent and transient hardware faults and from errors caused by improper programming. A number of fault tolerance solutions have emerged. Following a survey of fault tolerance in arrays, a discussion of solutions for more specialized architectures is presented.Keywords
This publication has 31 references indexed in Scilit:
- Imperfectly connected 2D arrays for image processingPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2003
- A proposal for a fault-tolerant binary hypercube architecturePublished by Institute of Electrical and Electronics Engineers (IEEE) ,2003
- Design and analysis of software reconfiguration strategies for hypercube multicomputers under multiple faultsPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2003
- Routing in modular fault tolerant multiprocessor systemsPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2003
- Free dimensions-an effective approach to achieving fault tolerance in hypercubePublished by Institute of Electrical and Electronics Engineers (IEEE) ,2003
- Cost effectiveness analysis of different fault tolerance strategies for hypercube systemsPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2002
- An evaluation of fault-tolerant hypercube architectures for onboard computingPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2002
- Reconfiguration algorithm for fault-tolerant arrays with minimum number of dangerous processorsPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2002
- Load sharing in hypercube multicomputers in the presence of node failuresPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2002
- A Fault-Tolerant Modular Architecture for Binary TreesIEEE Transactions on Computers, 1986