End-to-end service failure diagnosis using belief networks
- 25 June 2003
- conference paper
- Published by Institute of Electrical and Electronics Engineers (IEEE)
Abstract
We present fault localization techniques suitable for diagnosing end-to-end service problems in communication systems with complex topologies. We refine a layered system model that represents relationships between services and functions offered between neighboring protocol layers. In a given layer, an end-to-end service between two hosts may be provided using multiple host-to-host services offered in this layer between two hosts on the end-to-end path. Relationships among end-to-end and host-to-host services form a bipartite probabilistic dependency graph whose structure depends on the network topology in the corresponding protocol layer. When an end-to-end service fails or experiences performance problems it is important to efficiently find the responsible host-to-host services. Finding the most probable explanation (MPE) of the observed symptoms is NP-hard. We propose two fault localization techniques based on Pearl's (1988) iterative algorithms for singly connected belief networks. The probabilistic dependency graph is transformed into a belief network, and then the approximations based on Pearl's algorithms and exact bucket tree elimination algorithm are designed and evaluated through extensive simulation study.Keywords
This publication has 20 references indexed in Scilit:
- Increasing robustness of fault localization through analysis of lost, spurious, and positive symptomsPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2003
- A conceptual framework for network management event correlation and filtering systemsPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2003
- Combinatorial designs in multiple faults localization for battlefield networksPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2002
- Non-deterministic diagnosis of end-to-end service failures in a multi-layer communication systemPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2002
- Layered model for supporting fault isolation and recoveryPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2002
- Fault Isolation and Event Correlation for Integrated Fault ManagementPublished by Springer Nature ,1997
- High speed and robust event correlationIEEE Communications Magazine, 1996
- Bayesian networksCommunications of the ACM, 1995
- Identification of faulty links in dynamic-routed networksIEEE Journal on Selected Areas in Communications, 1993
- Alarm correlationIEEE Network, 1993