Large-scale fault isolation
- 1 May 2000
- journal article
- Published by Institute of Electrical and Electronics Engineers (IEEE) in IEEE Journal on Selected Areas in Communications
- Vol. 18 (5) , 733-743
- https://doi.org/10.1109/49.842989
Abstract
Of the many distributed applications designed for the Internet, the successful ones are those that have paid careful attention to scale and robustness. These applications share several design principles. In this paper, we illustrate the application of these principles to common network monitoring tasks. Specifically, we describe and evaluate 1) a robust distributed topology discovery mechanism and 2) a mechanism for scalable fault isolation in multicast distribution trees. Our mechanisms reveal a different design methodology for network monitoring-one that carefully trades off monitoring fidelity (where necessary) for more graceful degradation in the presence of different kinds of network dynamics.Keywords
This publication has 24 references indexed in Scilit:
- Fault detection in routing protocolsPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2005
- Proactive network fault detectionPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2002
- Distributed management by delegationPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2002
- Heuristics for Internet map discoveryPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2002
- Organizing multicast receivers deterministically by packet-loss correlationPublished by Association for Computing Machinery (ACM) ,1998
- Internet routing instabilityPublished by Association for Computing Machinery (ACM) ,1997
- Network text editor (NTE)ACM SIGCOMM Computer Communication Review, 1997
- Using multicast-SNMP to coordinate distributed management agentsPublished by Institute of Electrical and Electronics Engineers (IEEE) ,1996
- The Harvest information discovery and access systemComputer Networks and ISDN Systems, 1995
- Congestion avoidance and controlPublished by Association for Computing Machinery (ACM) ,1988