A first-order fine-grained multithreaded throughput model
- 1 February 2009
- conference paper
- Published by Institute of Electrical and Electronics Engineers (IEEE)
Abstract
Analytical modeling is an alternative to detailed performance simulation with the potential to shorten the development cycle and provide additional insights. This paper proposes analytical models for predicting the cache contention and throughput of heavily multithreaded architectures such as Sun Microsystems' Niagara. First, it proposes a novel probabilistic model to accurately predict the number of extra cache misses due to cache contention for significantly larger numbers of threads than possible with prior analytical cache contention models. Then it presents a Markov chain model for analytically estimating the throughput of multicore, fine-grained multithreaded architectures. The Markov model uses the number of stalled threads as the states and calculates transition probabilities based upon the rates and latencies of events stalling a thread. By modeling the overlapping of the stalls among threads and taking account of cache contention our models accurately predict system throughput obtained from a cycle-accurate performance simulator with an average error of 7.9%. We also demonstrate the application of our model to a design problem-optimizing the design of fine-grained multithreaded chip multiprocessors for application-specific workloads-yielding the same result as detailed simulations 65 times faster. Moreover, this paper shows that our models accurately predict cache contention and throughput trends across varying workloads on real hardware-a Sun Fire T1000 server.Keywords
This publication has 20 references indexed in Scilit:
- RAMP: Research Accelerator for Multiple ProcessorsIEEE Micro, 2007
- Automated design of application specific superscalar processorsPublished by Association for Computing Machinery (ACM) ,2007
- Theoretical modeling of superscalar processor performancePublished by Institute of Electrical and Electronics Engineers (IEEE) ,2005
- Niagara: A 32-Way Multithreaded Sparc ProcessorIEEE Micro, 2005
- Parallel program performance prediction using deterministic task graph analysisACM Transactions on Computer Systems, 2004
- A framework for statistical modeling of superscalar processor performancePublished by Institute of Electrical and Electronics Engineers (IEEE) ,2002
- Automatically characterizing large scale program behaviorPublished by Association for Computing Machinery (ACM) ,2002
- Modeling cost/performance of a parallel computer simulatorACM Transactions on Modeling and Computer Simulation, 1997
- An analytical cache modelACM Transactions on Computer Systems, 1989
- Footprints in the cacheACM Transactions on Computer Systems, 1987