A first-order fine-grained multithreaded throughput model

Abstract
Analytical modeling is an alternative to detailed performance simulation with the potential to shorten the development cycle and provide additional insights. This paper proposes analytical models for predicting the cache contention and throughput of heavily multithreaded architectures such as Sun Microsystems' Niagara. First, it proposes a novel probabilistic model to accurately predict the number of extra cache misses due to cache contention for significantly larger numbers of threads than possible with prior analytical cache contention models. Then it presents a Markov chain model for analytically estimating the throughput of multicore, fine-grained multithreaded architectures. The Markov model uses the number of stalled threads as the states and calculates transition probabilities based upon the rates and latencies of events stalling a thread. By modeling the overlapping of the stalls among threads and taking account of cache contention our models accurately predict system throughput obtained from a cycle-accurate performance simulator with an average error of 7.9%. We also demonstrate the application of our model to a design problem-optimizing the design of fine-grained multithreaded chip multiprocessors for application-specific workloads-yielding the same result as detailed simulations 65 times faster. Moreover, this paper shows that our models accurately predict cache contention and throughput trends across varying workloads on real hardware-a Sun Fire T1000 server.

This publication has 20 references indexed in Scilit: