A study of time-redundant fault tolerance techniques for high-performance pipelined computers
- 7 January 2003
- conference paper
- Published by Institute of Electrical and Electronics Engineers (IEEE)
- p. 436-443
- https://doi.org/10.1109/ftcs.1989.105616
Abstract
A class of fault-tolerance techniques using time redundancy can be a viable alternative for high-performance pipelined processors. Time-redundant fault-tolerance techniques, such as recomputing with shifted operands (RESO), have not been very popular, partly because of the perceived time overhead of such techniques. While the per-instruction time overhead can be quite high, especially if the degree of pipelining is low, the overhead can be very small (and possibly negligible) when the execution of an entire program is considered and the degree of pipelining is high. Simulation studies were carried out on the Cray-1 scalar unit using the well-known Livermore loops as benchmarks to determine the performance loss due to time-redundant fault-tolerance techniques. The results show that the overhead for such techniques is less than 10% in almost all cases and is negligibly small in most cases. This suggests that time-redundant techniques can be useful for fault tolerance in high-performance scalar processors with multiple pipelined functional units.<>Keywords
This publication has 17 references indexed in Scilit:
- A concurrent testing technique for digital circuitsIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 1988
- Implementing precise interrupts in pipelined processorsIEEE Transactions on Computers, 1988
- Cydra 5 directed dataflow architecturePublished by Institute of Electrical and Electronics Engineers (IEEE) ,1988
- Optimal Checkpointing of Real-Time TasksIEEE Transactions on Computers, 1987
- Processor Control Flow Monitoring Using Signatured Instruction StreamsIEEE Transactions on Computers, 1987
- Concurrent Error Detection in Multiply and Divide ArraysIEEE Transactions on Computers, 1983
- Concurrent Error Detection in ALU's by Recomputing with Shifted OperandsIEEE Transactions on Computers, 1982
- Fault Detection Capabilities of Alternating LogicIEEE Transactions on Computers, 1978
- The CRAY-1 computer systemCommunications of the ACM, 1978
- System structure for software fault toleranceIEEE Transactions on Software Engineering, 1975