CQoS
Top Cited Papers
- 26 June 2004
- conference paper
- Published by Association for Computing Machinery (ACM)
- p. 257-266
- https://doi.org/10.1145/1006209.1006246
Abstract
Cache hierarchies have been traditionally designed for usage by a single application, thread or core. As multi-threaded (MT) and multi-core (CMP) platform architectures emerge and their workloads range from single-threaded and multithreaded applications to complex virtual machines (VMs), a shared cache resource will be consumed by these different entities generating heterogeneous memory access streams exhibiting different locality properties and varying memory sensitivity. As a result, conventional cache management approaches that treat all memory accesses equally are bound to result in inefficient space utilization and poor performance even for applications with good locality properties. To address this problem, this paper presents a new cache management framework (CQoS) that (1) recognizes the heterogeneity in memory access streams, (2) introduces the notion of QoS to handle the varying degrees of locality and latency sensitivity and (3) assigns and enforces priorities to streams based on latency sensitivity, locality degree and application performance needs. To achieve this, we propose CQoS options for priority classification, priority assignment and priority enforcement. We briefly describe CQoS priority classification and assignment options -- ranging from user-driven and developer-driven to compiler-detected and flow-based approaches. Our focus in this paper is on CQoS mechanisms for priority enforcement -- these include (1) selective cache allocation, (2) static/dynamic set partitioning and (3) heterogeneous cache regions. We discuss the architectural design and implementation complexity of these CQoS options. To evaluate the performance trade-offs for these options, we have modeled these CQoS options in a cache simulator and evaluated their performance in CMP platforms running network-intensive server workloads. Our simulation results show the effectiveness of our proposed options and make the case for CQoS in future multi-threaded/multi-core platforms since it improves shared cache efficiency and increases overall system performance as a result.Keywords
This publication has 17 references indexed in Scilit:
- Architectural Characterization of TCP/IP Packet Processing on the Pentium® M MicroprocessorPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2005
- Dynamic Partitioning of Shared Cache MemoryThe Journal of Supercomputing, 2004
- Token coherence: a new framework for shared-memory multiprocessorsIEEE Micro, 2003
- Skewed associativity improves program performance and enhances predictabilityIEEE Transactions on Computers, 1997
- Decoupled sectored cachesIEEE Transactions on Computers, 1997
- Data forwarding in scalable shared-memory multiprocessorsIEEE Transactions on Parallel and Distributed Systems, 1996
- Integrating Fine-Grained Message Passing in Cache Coherent Shared Memory MultiprocessorsJournal of Parallel and Distributed Computing, 1996
- Hitting the memory wallACM SIGARCH Computer Architecture News, 1995
- Simultaneous multithreadingPublished by Association for Computing Machinery (ACM) ,1995
- Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffersPublished by Association for Computing Machinery (ACM) ,1990