Multi-execution

15 June 2009

journal article
Published by Association for Computing Machinery (ACM) in ACM SIGARCH Computer Architecture News

Vol. 37 (3) , 164-173
https://doi.org/10.1145/1555815.1555777

Abstract

While microprocessor designers turn to multicore architectures to sustain performance expectations, the dramatic increase in parallelism of such architectures will put substantial demands on off-chip bandwidth and make the memory wall more significant than ever. This paper demonstrates that one profitable application of multicore processors is the execution of many similar instantiations of the same program. We identify that this model of execution is used in several practical scenarios and term it as "multi-execution." Often, each such instance utilizes very similar data. In conventional cache hierarchies, each instance would cache its own data independently. We propose the Mergeable cache architecture that detects data similarities and merges cache blocks, resulting in substantial savings in cache storage requirements. This leads to reductions in off-chip memory accesses and overall power usage, and increases in application performance. We present cycle-accurate simulation results of 8 benchmarks (6 from SPEC2000) to demonstrate that our technique provides a scalable solution and leads to significant speedups due to reductions in main memory accesses. For 8 cores running 8 similar executions of the same application and sharing an exclusive 4-MB, 8-way L2 cache, the Mergeable cache shows a speedup in execution by 2.5x on average (ranging from 0.93x to 6.92x), while posing an overhead of only 4.28% on cache area and 5.21% on power when it is used.

Keywords

This publication has 14 references indexed in Scilit:

Valgrind
Published by Association for Computing Machinery (ACM) ,2007
Cooperative Caching for Chip Multiprocessors
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2006
Algorithms for parallel boosting
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2006
DRAMsim
ACM SIGARCH Computer Architecture News, 2005
The STAMPede approach to thread-level speculation
ACM Transactions on Computer Systems, 2005
Monte Carlo algorithms for stationary device simulations
Mathematics and Computers in Simulation, 2003
Memory resource management in VMware ESX server
ACM SIGOPS Operating Systems Review, 2002
MinneSPEC: A New SPEC Benchmark Workload for Simulation-Based Computer Architecture Research
IEEE Computer Architecture Letters, 2002
Functional implementation techniques for CPU cache memories
IEEE Transactions on Computers, 1999
CACTI: an enhanced cache access and cycle time model
IEEE Journal of Solid-State Circuits, 1996