Cooperative Caching for Chip Multiprocessors
- 1 May 2006
- journal article
- Published by Association for Computing Machinery (ACM) in ACM SIGARCH Computer Architecture News
- Vol. 34 (2) , 264-276
- https://doi.org/10.1145/1150019.1136509
Abstract
This paper presents CMP Cooperative Caching, a unified framework to manage a CMP's aggregate on-chip cache resources. Cooperative caching combines the strengths of private and shared cache organizations by forming an aggregate "shared" cache through cooperation among private caches. Locally active data are attracted to the private caches by their accessing processors to reduce remote on-chip references, while globally active data are cooperatively identified and kept in the aggregate cache to reduce off-chip accesses. Examples of cooperation include cache-to-cache transfers of clean data, replication-aware data replacement, and global replacement of inactive data. These policies can be implemented by modifying an existing cache replacement policy and cache coherence protocol, or by the new implementation of a directory-based protocol presented in this paper. Our evaluation using full-system simulation shows that cooperative caching achieves an off-chip miss rate similar to that of a shared cache, and a local cache hit rate similar to that of using private caches. Cooperative caching performs robustly over a range of system/cache sizes and memory latencies. For an 8-core CMP with 1MB L2 cache per core, the best cooperative caching scheme improves the performance of multithreaded commercial workloads by 5-11% compared with a shared cache and 4-38% compared with private caches. For a 4-core CMP running multiprogrammed SPEC2000 workloads, cooperative caching is on average 11% and 6% faster than shared and private cache organizations, respectively.Keywords
This publication has 24 references indexed in Scilit:
- Multifacet's general execution-driven multiprocessor simulator (GEMS) toolsetACM SIGARCH Computer Architecture News, 2005
- A NUCA substrate for flexible CMP cache sharingPublished by Association for Computing Machinery (ACM) ,2005
- Niagara: A 32-Way Multithreaded Sparc ProcessorIEEE Micro, 2005
- Simulating a $2M commercial server on a $2K PCComputer, 2003
- An adaptive, non-uniform cache structure for wire-delay dominated on-chip cachesPublished by Association for Computing Machinery (ACM) ,2002
- Simics: A full system simulation platformComputer, 2002
- Summary cache: a scalable wide-area Web cache sharing protocolIEEE/ACM Transactions on Networking, 2000
- PiranhaPublished by Association for Computing Machinery (ACM) ,2000
- Implementing global memory management in a workstation clusterPublished by Association for Computing Machinery (ACM) ,1995
- DDM-a cache-only memory architectureComputer, 1992