ED/sup 4/I: error detection by diverse data and duplicated instructions
Top Cited Papers
- 7 August 2002
- journal article
- Published by Institute of Electrical and Electronics Engineers (IEEE) in IEEE Transactions on Computers
- Vol. 51 (2) , 180-199
- https://doi.org/10.1109/12.980007
Abstract
Errors in computing systems can cause abnormal behavior and degrade data integrity and system availability. Errors should be avoided especially in embedded systems for critical applications. However, as the trend in VLSI technologies has been toward smaller feature sizes, lower supply voltages and higher frequencies, there is a growing concern about temporary errors as well as permanent errors in embedded systems; thus, it is very essential to detect those errors. Software-implemented hardware fault tolerance (SIHFT) is a low-cost alternative to hardware fault-tolerance techniques for embedded processors: It does not require any hardware modification of commercial off-the-shelf (COTS) processors. ED/sup 4/I (error detection by data diversity and duplicated instructions) is a SIHFT technique that detects both permanent and temporary errors by executing two "different" programs (with the same functionality) and comparing their outputs. ED/sup 4/I maps each number, x, in the original program into a new number x', and then transforms the program so that it operates on the new numbers so that the results can be mapped backwards for comparison with the results of the original program. The mapping in the transformation of ED/sup 4/I is x' = k/spl middot/x for integer numbers, where k/sub f/ determines the fault detection probability and data integrity of the system. For floating-point numbers, we find a value of k/sub f/ for the fraction and k/sub e/ for the exponent separately, and use k = k/sub f//spl times/2/sup k/ for the value of k. We have demonstrated how to choose an optimal value of k for the transformation. This paper shows that, for integer programs, the transformation with k = -2 was the most desirable choice in six out of seven benchmark programs we simulated. It maximizes the fault detection probability under the condition that the data integrity is highest.Keywords
This publication has 25 references indexed in Scilit:
- Executable assertions for detecting data errors in embedded control systemsPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2002
- Error detection by duplicated instructions in super-scalar processorsIEEE Transactions on Reliability, 2002
- Bidwidth analysis with application to silicon compilationPublished by Association for Computing Machinery (ACM) ,2000
- Accurate static branch prediction by value range propagationPublished by Association for Computing Machinery (ACM) ,1995
- Architectural principles for safety-critical real-time applicationsProceedings of the IEEE, 1994
- Conceptual modeling of coincident failures in multiversion softwareIEEE Transactions on Software Engineering, 1989
- A Theoretical Basis for the Analysis of Multiversion Software Subject to Coincident ErrorsIEEE Transactions on Software Engineering, 1985
- Fault Tolerance by Design Diversity: Concepts and ExperimentsComputer, 1984
- Compiler Analysis of the Value Ranges for VariablesIEEE Transactions on Software Engineering, 1977
- Probabilistic Treatment of General Combinational NetworksIEEE Transactions on Computers, 1975