Adaptive Cache Compression for High-Performance Processors

2 March 2004

journal article
Published by Association for Computing Machinery (ACM) in ACM SIGARCH Computer Architecture News

Vol. 32 (2) , 212
https://doi.org/10.1145/1028176.1006719

Abstract

Modern processors use two or more levels ofcache memories to bridge the rising disparity betweenprocessor and memory speeds. Compression canimprove cache performance by increasing effectivecache capacity and eliminating misses. However,decompressing cache lines also increases cache accesslatency, potentially degrading performance.In this paper, we develop an adaptive policy thatdynamically adapts to the costs and benefits of cachecompression. We propose a two-level cache hierarchywhere the L1 cache holds uncompressed data and the L2cache dynamically selects between compressed anduncompressed storage. The L2 cache is 8-way set-associativewith LRU replacement, where each set can storeup to eight compressed lines but has space for only fouruncompressed lines. On each L2 reference, the LRUstack depth and compressed size determine whethercompression (could have) eliminated a miss or incurs anunnecessary decompression overhead. Based on thisoutcome, the adaptive policy updates a single globalsaturating counter, which predicts whether to allocatelines in compressed or uncompressed form.We evaluate adaptive cache compression usingfull-system simulation and a range of benchmarks. Weshow that compression can improve performance formemory-intensive commercial workloads by up to 17%.However, always using compression hurts performancefor low-miss-rate benchmarks-due to unnecessarydecompression overhead-degrading performance byup to 18%. By dynamically monitoring workload behavior,the adaptive policy achieves comparable benefitsfrom compression, while never degrading performanceby more than 0.4%.

Keywords

This publication has 18 references indexed in Scilit:

Simulating a $2M commercial server on a $2K PC
Computer, 2003
Frequent value locality and its applications
ACM Transactions on Embedded Computing Systems, 2002
Automatically characterizing large scale program behavior
Published by Association for Computing Machinery (ACM) ,2002
Simics: A full system simulation platform
Computer, 2002
Effective algorithms for cache-level compression
Published by Association for Computing Machinery (ACM) ,2001
Frequent value compression in data caches
Published by Association for Computing Machinery (ACM) ,2000
An on-chip cache compression technique to reduce decompression overhead and design complexity
Journal of Systems Architecture, 2000
The Alpha 21264 microprocessor
IEEE Micro, 1999
Generating representative Web workloads for network and server performance evaluation
Published by Association for Computing Machinery (ACM) ,1998
Decoupled sectored caches
IEEE Transactions on Computers, 1997