Resource-aware stream management with the customizable dproc distributed monitoring mechanisms
- 24 January 2004
- conference paper
- Published by Institute of Electrical and Electronics Engineers (IEEE)
- No. 10828907,p. 250-259
- https://doi.org/10.1109/hpdc.2003.1210034
Abstract
Monitoring the resources of distributed systems is essential to the successful deployment and execution of grid applications, particularly when such applications have well-defined QoS requirements. The dproc system-level monitoring mechanisms implemented for standard Linux kernels have several key components. First, utilizing the familiar /proc filesystem, dproc extends this interface with resource information collected from both local and remote hosts. Second, to predictably capture and distribute monitoring information, dproc uses a kernel-level group communication facility, termed KECho, which is based on events and event channels. Third and the focus of this paper is dproc's run-time customizability for resource monitoring, which includes the generation and deployment of monitoring functionality within remote operating system kernels. Using dproc, we show that: (a) data streams can be customized according to a client's resource availabilities (dynamic stream management); (b) by dynamically varying distributed monitoring (dynamic filtering of monitoring information), appropriate balance can be maintained between monitoring overheads and application quality; and (c) by performing monitoring at kernel-level, the information captured enables decision making that takes into account the multiple resources used by applications.Keywords
This publication has 11 references indexed in Scilit:
- Supermon: a high-speed cluster monitoring systemPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2003
- HiFi: a new monitoring architecture for distributed systems managementPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2003
- Event services for high performance computingPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2002
- dproc - Extensible Run-Time Resource Monitoring for Cluster ApplicationsPublished by Springer Nature ,2002
- KECho— Event Communication for Distributed Kernel ServicesPublished by Springer Nature ,2002
- The Monitoring and Steering EnvironmentPublished by Springer Nature ,2001
- PARMON: a portable and scalable monitoring system for clustersSoftware: Practice and Experience, 2000
- Performance monitoring in a Myrinet-connected SHRIMP clusterPublished by Association for Computing Machinery (ACM) ,1998
- GEM: a generalized event monitoring language for distributed systemsDistributed Systems Engineering, 1997
- The Paradyn parallel performance measurement toolComputer, 1995