Identifying software problems using symptoms
- 17 December 2002
- conference paper
- Published by Institute of Electrical and Electronics Engineers (IEEE)
- p. 320-329
- https://doi.org/10.1109/ftcs.1994.315628
Abstract
This paper presents an approach to automatically identify recurrent software failures using symptoms, in environments where many users run the same software. The approach is based on observations that the majority of field software failures in such environments are recurrences and that failures due to a single fault often share common symptoms. The paper proposes the comparison of failure symptoms, such as stack traces and symptom strings, as a strategy for identifying recurrences. This diagnosis strategy is applied using the actual field software failure data. The results obtained are compared with the diagnosis and repair logs by analysts. Results of such comparisons using the failure, diagnosis, and repair logs in two Tandem system software products show that between 75% and 95% of recurrences can be identified successfully by matching stack traces and symptom strings. Less than 10% of faults are misdiagnosed. These results indicate that automatic identification of recurrences based on their symptoms is possible.<>Keywords
This publication has 9 references indexed in Scilit:
- Faults, symptoms, and software fault tolerance in the Tandem GUARDIAN90 operating systemPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2002
- Detection and discrimination of injected network faultsPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2002
- Software defects and their impact on system availability-a study of field failures in operating systemsPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2002
- Measurement-based evaluation of operating system fault toleranceIEEE Transactions on Reliability, 1993
- Orthogonal defect classification-a concept for in-process measurementsIEEE Transactions on Software Engineering, 1992
- Automatic recognition of intermittent failures: an experimental study of field dataIEEE Transactions on Computers, 1990
- Error log analysis: statistical modeling and heuristic trend analysisIEEE Transactions on Reliability, 1990
- A census of Tandem system availability between 1985 and 1990IEEE Transactions on Reliability, 1990
- Optimizing Preventive Service of Software ProductsIBM Journal of Research and Development, 1984