A systematic methodology to compute the architectural vulnerability factors for a high-performance microprocessor
Top Cited Papers
- 6 May 2004
- conference paper
- Published by Institute of Electrical and Electronics Engineers (IEEE)
Abstract
Single-event upsets from particle strikes have become a key challenge in microprocessor design. Techniques to deal with these transients faults exist, but come at a cost. Designers clearly require accurate estimates of processor error rates to make appropriate cost/reliability tradeoffs. This paper describes a method for generating these estimates. A key aspect of this analysis is that some single-bit faults (such as those occurring in the branch predictor) do not produce an error in a program's output. We define a structure's architectural vulnerability factor (AVF) as the probability that a fault in that particular structure do not result in an error. A structure's error rate is the product of its raw error rate, as determined by process and circuit technology, and the AVF. Unfortunately, computing AVFs of complex structures, such as the instruction queue, can be quite involved. We identify numerous cases, such as prefetches, dynamically dead code, and wrong-path instructions, in which a fault do not affect, correct execution. We instrument a detailed 1A64 processor simulator to map bit-level microarchitectural state to these cases, generating per-structure AVF estimates. This analysis shows AVFs of 28% and 9% for the instruction queue and execution units, respectively, averaged across dynamic sections of the entire CPU2000 benchmark suite.Keywords
This publication has 13 references indexed in Scilit:
- A 1.3 GHz fifth generation SPARC64 microprocessorPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2003
- Detailed design and evaluation of redundant multi-threading alternativesPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2003
- DIVA: a reliable substrate for deep submicron microarchitecture designPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2003
- Transient-fault recovery for chip multiprocessorsPublished by Association for Computing Machinery (ACM) ,2003
- Scaling trends of cosmic ray induced soft errors in static latches beyond 0.18 μPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2002
- Dynamic dead-instruction detection and eliminationPublished by Association for Computing Machinery (ACM) ,2002
- Asim: a performance model frameworkComputer, 2002
- Upset hardened memory design for submicron CMOS technologyIEEE Transactions on Nuclear Science, 1996
- IBM experiments in soft fails in computer electronics (1978–1994)IBM Journal of Research and Development, 1996
- Terrestrial cosmic raysIBM Journal of Research and Development, 1996