A methodology for detection and estimation of software aging
Top Cited Papers
- 27 November 2002
- conference paper
- Published by Institute of Electrical and Electronics Engineers (IEEE)
- Vol. 774 (10719458) , 283-292
- https://doi.org/10.1109/issre.1998.730892
Abstract
The phenomenon of software aging refers to the accumulation of errors during the execution of the software which eventually results in it's crash/hang failure. A gradual performance degradation may also accompany software aging. Pro-active fault management techniques such as "software rejuvenation" (Y. Huang et al., 1995) may be used to counteract aging if it exists. We propose a methodology for detection and estimation of aging in the UNIX operating system. First, we present the design and implementation of an SNMP based, distributed monitoring tool used to collect operating system resource usage and system activity data at regular intervals, from networked UNIX workstations. Statistical trend detection techniques are applied to this data to detect/validate the existence of aging. For quantifying the effect of aging in operating system resources, we propose a metric: "estimated time to exhaustion", which is calculated using well known slope estimation techniques. Although the distributed data collection tool is specific to UNIX, the statistical techniques can be used for detection and estimation of aging in other software as well.Keywords
This publication has 17 references indexed in Scilit:
- Identifying software problems using symptomsPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2002
- Software defects and their impact on system availability-a study of field failures in operating systemsPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2002
- Measurement of failure rate in widely distributed softwarePublished by Institute of Electrical and Electronics Engineers (IEEE) ,2002
- Structure and identification of management information for TCP/IP-based internetsPublished by RFC Editor ,1990
- Automatic recognition of intermittent failures: an experimental study of field dataIEEE Transactions on Computers, 1990
- Error log analysis: statistical modeling and heuristic trend analysisIEEE Transactions on Reliability, 1990
- A case study of Ethernet anomalies in a distributed computing environmentIEEE Transactions on Reliability, 1990
- Effect of System Workload on Operating System Reliability: A Study on IBM 3081IEEE Transactions on Software Engineering, 1985
- Robust Locally Weighted Regression and Smoothing ScatterplotsJournal of the American Statistical Association, 1979
- Estimates of the Regression Coefficient Based on Kendall's TauJournal of the American Statistical Association, 1968