A NUCA Substrate for Flexible CMP Cache Sharing
- 13 August 2007
- journal article
- Published by Institute of Electrical and Electronics Engineers (IEEE) in IEEE Transactions on Parallel and Distributed Systems
- Vol. 18 (8) , 1028-1040
- https://doi.org/10.1109/tpds.2007.1091
Abstract
We propose an organization for the on-chip memory system of a chip multiprocessor in which 16 processors share a 16-Mbyte pool of 64 level-2 (L2) cache banks. The L2 cache is organized as a nonuniform cache architecture (NUCA) array with a switched network embedded in it for high performance. We show that this organization can support a spectrum of degrees of sharing: unshared, in which each processor owns a private portion of the cache, thus reducing hit latency, and completely shared, in which every processor shares the entire cache, thus minimizing misses, and every point in between. We measure the optimal degree of sharing for different cache bank mapping policies and also evaluate a per-application cache partitioning strategy. We conclude that a static NUCA organization with sharing degrees of 2 or 4 works best across a suite of commercial and scientific parallel workloads. We demonstrate that migratory dynamic NUCA approaches improve performance significantly for a subset of the workloads at the cost of increased complexity, especially as per-application cache partitioning strategies are applied. We also evaluate the energy efficiency of each design point in terms of network traffic, bank accesses, and external memory accesses.Keywords
This publication has 22 references indexed in Scilit:
- Distributed Microarchitectural Protocols in the TRIPS Prototype Processor40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 2007), 2006
- Cooperative Caching for Chip MultiprocessorsACM SIGARCH Computer Architecture News, 2006
- Managing Wire Delay in Large Chip-Multiprocessor CachesPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2005
- Victim Replication: Maximizing Capacity while Hiding Wire Delay in Tiled Chip MultiprocessorsPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2005
- Optimizing Replication, Communication, and Capacity Allocation in CMPsPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2005
- A new memory monitoring scheme for memory-aware scheduling and partitioningPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2004
- Dynamic Partitioning of Shared Cache MemoryThe Journal of Supercomputing, 2004
- POWER4 system microarchitectureIBM Journal of Research and Development, 2002
- The Stanford Hydra CMPIEEE Micro, 2000
- Effective hardware-based data prefetching for high-performance processorsIEEE Transactions on Computers, 1995