A first-order fine-grained multithreaded throughput model

1 February 2009

conference paper
Published by Institute of Electrical and Electronics Engineers (IEEE)

p. 329-340
https://doi.org/10.1109/hpca.2009.4798270

Abstract

Analytical modeling is an alternative to detailed performance simulation with the potential to shorten the development cycle and provide additional insights. This paper proposes analytical models for predicting the cache contention and throughput of heavily multithreaded architectures such as Sun Microsystems' Niagara. First, it proposes a novel probabilistic model to accurately predict the number of extra cache misses due to cache contention for significantly larger numbers of threads than possible with prior analytical cache contention models. Then it presents a Markov chain model for analytically estimating the throughput of multicore, fine-grained multithreaded architectures. The Markov model uses the number of stalled threads as the states and calculates transition probabilities based upon the rates and latencies of events stalling a thread. By modeling the overlapping of the stalls among threads and taking account of cache contention our models accurately predict system throughput obtained from a cycle-accurate performance simulator with an average error of 7.9%. We also demonstrate the application of our model to a design problem-optimizing the design of fine-grained multithreaded chip multiprocessors for application-specific workloads-yielding the same result as detailed simulations 65 times faster. Moreover, this paper shows that our models accurately predict cache contention and throughput trends across varying workloads on real hardware-a Sun Fire T1000 server.

Keywords

This publication has 20 references indexed in Scilit:

RAMP: Research Accelerator for Multiple Processors
IEEE Micro, 2007
Automated design of application specific superscalar processors
Published by Association for Computing Machinery (ACM) ,2007
Theoretical modeling of superscalar processor performance
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2005
Niagara: A 32-Way Multithreaded Sparc Processor
IEEE Micro, 2005
Parallel program performance prediction using deterministic task graph analysis
ACM Transactions on Computer Systems, 2004
A framework for statistical modeling of superscalar processor performance
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2002
Automatically characterizing large scale program behavior
Published by Association for Computing Machinery (ACM) ,2002
Modeling cost/performance of a parallel computer simulator
ACM Transactions on Modeling and Computer Simulation, 1997
An analytical cache model
ACM Transactions on Computer Systems, 1989
Footprints in the cache
ACM Transactions on Computer Systems, 1987