Hardware-Related Software Errors: Measurement and Analysis
- 1 February 1985
- journal article
- Published by Institute of Electrical and Electronics Engineers (IEEE) in IEEE Transactions on Software Engineering
- Vol. SE-11 (2) , 223-231
- https://doi.org/10.1109/tse.1985.232198
Abstract
This paper describes an analysis of hardware-related software (HW/SW) errors on an MVS/SP operating system at Stanford University. The analysis procedure demonstrates a methodology for evaluating the interaction between hardware and software as it relates to system reliability. The paper examines the operating system's handling of HW/SW errors and also the effectiveness of recovery management. Nearly 35 percent of all observed software failures were found to be hareware-related. The analysis shows that the operating system is seldom able to diagnose that a software error may be hardware-related. The impact of HW/SW errors on the system is evaluated by measuring the effectiveness of system recovery in containing the propagation of HW/SW errors. The system failure probability for HW/SW errors is close to three times that for software errors in general. The observed HW/SW errors are seen to have a specific pattern, suggesting the possibility of the use of such error patterns for intelligent error prediction and recovery.Keywords
This publication has 6 references indexed in Scilit:
- A STATISTICAL LOAD DEPENDENCY MODEL FOR CPU ERRORS AT SLACPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2005
- A Study of Software Failures and Recovery in the MVS Operating SystemIEEE Transactions on Computers, 1984
- A Statistical Failure/Load Relationship: Results of a Multicomputer StudyIEEE Transactions on Computers, 1982
- The Evolution of the MVS Operating SystemIBM Journal of Research and Development, 1981
- Persistent Software ErrorsIEEE Transactions on Software Engineering, 1981
- An analysis of errors and their causes in system programsIEEE Transactions on Software Engineering, 1975