CQoS

Top Cited Papers

26 June 2004

conference paper
Published by Association for Computing Machinery (ACM)

p. 257-266
https://doi.org/10.1145/1006209.1006246

Abstract

Cache hierarchies have been traditionally designed for usage by a single application, thread or core. As multi-threaded (MT) and multi-core (CMP) platform architectures emerge and their workloads range from single-threaded and multithreaded applications to complex virtual machines (VMs), a shared cache resource will be consumed by these different entities generating heterogeneous memory access streams exhibiting different locality properties and varying memory sensitivity. As a result, conventional cache management approaches that treat all memory accesses equally are bound to result in inefficient space utilization and poor performance even for applications with good locality properties. To address this problem, this paper presents a new cache management framework (CQoS) that (1) recognizes the heterogeneity in memory access streams, (2) introduces the notion of QoS to handle the varying degrees of locality and latency sensitivity and (3) assigns and enforces priorities to streams based on latency sensitivity, locality degree and application performance needs. To achieve this, we propose CQoS options for priority classification, priority assignment and priority enforcement. We briefly describe CQoS priority classification and assignment options -- ranging from user-driven and developer-driven to compiler-detected and flow-based approaches. Our focus in this paper is on CQoS mechanisms for priority enforcement -- these include (1) selective cache allocation, (2) static/dynamic set partitioning and (3) heterogeneous cache regions. We discuss the architectural design and implementation complexity of these CQoS options. To evaluate the performance trade-offs for these options, we have modeled these CQoS options in a cache simulator and evaluated their performance in CMP platforms running network-intensive server workloads. Our simulation results show the effectiveness of our proposed options and make the case for CQoS in future multi-threaded/multi-core platforms since it improves shared cache efficiency and increases overall system performance as a result.

Keywords

This publication has 17 references indexed in Scilit:

Architectural Characterization of TCP/IP Packet Processing on the Pentium® M Microprocessor
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2005
Dynamic Partitioning of Shared Cache Memory
The Journal of Supercomputing, 2004
Token coherence: a new framework for shared-memory multiprocessors
IEEE Micro, 2003
Skewed associativity improves program performance and enhances predictability
IEEE Transactions on Computers, 1997
Decoupled sectored caches
IEEE Transactions on Computers, 1997
Data forwarding in scalable shared-memory multiprocessors
IEEE Transactions on Parallel and Distributed Systems, 1996
Integrating Fine-Grained Message Passing in Cache Coherent Shared Memory Multiprocessors
Journal of Parallel and Distributed Computing, 1996
Hitting the memory wall
ACM SIGARCH Computer Architecture News, 1995
Simultaneous multithreading
Published by Association for Computing Machinery (ACM) ,1995
Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers
Published by Association for Computing Machinery (ACM) ,1990