Detecting Application-Level Failures in Component-Based Internet Services
- 19 September 2005
- journal article
- research article
- Published by Institute of Electrical and Electronics Engineers (IEEE) in IEEE Transactions on Neural Networks
- Vol. 16 (5) , 1027-1041
- https://doi.org/10.1109/tnn.2005.853411
Abstract
Most Internet services (e-commerce, search engines, etc.) suffer faults. Quickly detecting these faults can be the largest bottleneck in improving availability of the system. We present Pinpoint, a methodology for automating fault detection in Internet services by: 1) observing low-level internal structural behaviors of the service; 2) modeling the majority behavior of the system as correct; and 3) detecting anomalies in these behaviors as possible symptoms of failures. Without requiring any a priori application-specific information, Pinpoint correctly detected 89%-96% of major failures in our experiments, as compared with 20%-70% detected by current application-generic techniques.Keywords
This publication has 27 references indexed in Scilit:
- Autonomous recovery in componentized Internet applicationsCluster Computing, 2006
- Cheap recoveryACM Transactions on Storage, 2005
- Failure diagnosis using decision treesPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2004
- The design of an acquisitional query processor for sensor networksPublished by Association for Computing Machinery (ACM) ,2003
- Polyglot: An Extensible Compiler Framework for JavaPublished by Springer Nature ,2003
- Internet service performance failure detectionACM SIGMETRICS Performance Evaluation Review, 1998
- Fault injection techniques and toolsComputer, 1997
- High speed and robust event correlationIEEE Communications Magazine, 1996
- Alarm correlation and fault identification in communication networksIEEE Transactions on Communications, 1994
- Induction of decision treesMachine Learning, 1986