Experimental evaluation of the fail-silent behaviour in programs with consistency checks
- 23 December 2002
- conference paper
- Published by Institute of Electrical and Electronics Engineers (IEEE)
- No. 07313071,p. 394-403
- https://doi.org/10.1109/ftcs.1996.534625
Abstract
An important research topic deals with the investigation of whether a non-duplicated computer can be made fail-silent, since that behaviour is a-priori assumed in many algorithms. However, previous research has shown that in systems using a simple behaviour based error detection mechanism invisible to the programmer (e.g. memory protection), the percentage of fail-silent violations could be higher than 10%. Since the study of these errors has shown that they were mostly caused by pure data errors, we evaluate the effectiveness of software techniques capable of checking the semantics of the data, such as assertions, to detect these remaining errors. The results of injecting physical pin-level faults show that these tests can prevent about 40% of the fail-silent model violations that escape the simple hardware-based error detection techniques. In order to decouple the intrinsic limitations of the tests used from other factors that might affect its error detection capabilities, we evaluated a special class of software checks known for its high theoretical coverage: algorithm based fault tolerance (ABFT). The analysis of the remaining errors showed that most of them remained undetected due to short range control flow errors. When very simple software-based control flow checking was associated to the semantic tests, the target system, without any dedicated error detection hardware, behaved according to the fail-silent model for about 98% of all the faults injected.Keywords
This publication has 11 references indexed in Scilit:
- TWO FAULT INJECTION TECHNIQUES FOR TEST OF FAULT HANDLING MECHANISMSPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2005
- Fault injection for dependability validation of fault-tolerant computing systemsPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2003
- Two software techniques for on-line error detectionPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2003
- Experimental evaluation of the fail-silent behavior in computers without error maskingPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2002
- A fault-tolerant FFT processorIEEE Transactions on Computers, 1988
- The Delta-4 approach to dependability in open distributed computing systemsPublished by Institute of Electrical and Electronics Engineers (IEEE) ,1988
- Concurrent error detection using watchdog processors-a surveyIEEE Transactions on Computers, 1988
- Fault Tolerance in Process Control: An Overview And Examples of European ProductsIEEE Micro, 1987
- A Measurement-Based Model for Workload Dependence of CPU ErrorsIEEE Transactions on Computers, 1986
- Algorithm-Based Fault Tolerance for Matrix OperationsIEEE Transactions on Computers, 1984