Error detection by duplicated instructions in super-scalar processors
Top Cited Papers
- 7 August 2002
- journal article
- Published by Institute of Electrical and Electronics Engineers (IEEE) in IEEE Transactions on Reliability
- Vol. 51 (1) , 63-75
- https://doi.org/10.1109/24.994913
Abstract
This paper proposes a pure software technique "error detection by duplicated instructions" (EDDI), for detecting errors during usual system operation. Compared to other error-detection techniques that use hardware redundancy, EDDI does not require any hardware modifications to add error detection capability to the original system. EDDI duplicates instructions during compilation and uses different registers and variables for the new instructions. Especially for the fault in the code segment of memory, formulas are derived to estimate the error-detection coverage of EDDI using probabilistic methods. These formulas use statistics of the program, which are collected during compilation. EDDI was applied to eight benchmark programs and the error-detection coverage was estimated. Then, the estimates were verified by simulation, in which a fault injector forced a bit-flip in the code segment of executable machine codes. The simulation results validated the estimated fault coverage and show that approximately 1.5% of injected faults produced incorrect results in eight benchmark programs with EDDI, while on average, 20% of injected faults produced undetected incorrect results in the programs without EDDI. Based on the theoretical estimates and actual fault-injection experiments, EDDI can provide over 98% fault-coverage without any extra hardware for error detection. This pure software technique is especially useful when designers cannot change the hardware, but they need dependability in the computer system. To reduce the performance overhead, EDDI schedules the instructions that are added for detecting errors such that "instruction-level parallelism" (ILP) is maximized. Performance overhead can be reduced by increasing ILP within a single super-scalar processor. The execution time overhead in a 4-way super-scalar processor is less than the execution time overhead in the processors that can issue two instructions in one cycle.Keywords
This publication has 22 references indexed in Scilit:
- Processor Monitoring Using Asynchronous Signatured Instruction StreamsPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2005
- A study of time-redundant fault tolerance techniques for high-performance pipelined computersPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2003
- Saturation: reduced idleness for improved fault-tolerancePublished by Institute of Electrical and Electronics Engineers (IEEE) ,2003
- Two software techniques for on-line error detectionPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2003
- Exploiting instruction-level parallelism for integrated control-flow monitoringIEEE Transactions on Computers, 1994
- Concurrent Error Detection using Signature Monitoring and EncryptionPublished by Springer Nature ,1991
- Continuous signature monitoring: low-cost concurrent detection of processor control errorsIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 1990
- Performance analysis of a generalized concurrent error detection procedureIEEE Transactions on Computers, 1990
- Concurrent error detection using watchdog processors-a surveyIEEE Transactions on Computers, 1988
- Watchdog Processors and Structural Integrity CheckingIEEE Transactions on Computers, 1982