Impact of heterogeneity on DSM performance

7 November 2002

conference paper
Published by Institute of Electrical and Electronics Engineers (IEEE)

p. 26-35
https://doi.org/10.1109/hpca.2000.824336

Abstract

This paper explores area/parallelism tradeoffs in the design of distributed shared-memory (DSM) multiprocessors built out of large single-chip computing nodes. In this context, area-efficiency arguments motivate a heterogeneous organization consisting of few nodes with large caches designed for single-thread parallelism, and a larger number of nodes with smaller caches designed for multi-thread parallelism. Quantitative performance of such organization is reported for a set of homogeneous multiprocessor programs from the SPLASH-2 benchmark suite. These programs are mapped onto the heterogeneous processors without source code modifications via static thread assignment policies. Simulation-based analysis is used to compare the performance of heterogeneous and homogeneous DSMs that occupy the same silicon area. The analysis shows that a 4-node heterogeneous DSM with 21 processors outperforms its homogeneous counterpart with 4 processors by an average age of 36% for the studied multiprocessor workload, while having the same performance for sequential codes. A sensitivity analysis based on a factorial design experiment is used to study the implications of processor, memory, and network heterogeneity on overall cost and performance of a heterogeneous DSM. The studied benchmarks are affected, on average, primarily by heterogeneity in processor performance (59.3%), followed by cache sizes (18.2%), memory latency (14.6%), and network latency (5.6%).

Keywords

This publication has 20 references indexed in Scilit:

An argument for simple COMA
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2002
Wisconsin Wind Tunnel II: a fast, portable parallel architecture simulator
IEEE Concurrency, 2000
Data speculation support for a chip multiprocessor
Published by Association for Computing Machinery (ACM) ,1998
Simultaneous multithreading: a platform for next-generation processors
IEEE Micro, 1997
Baring it all to software: Raw machines
Computer, 1997
One billion transistors, one uniprocessor, one chip
Computer, 1997
Scalable processors in the billion-transistor era: IRAM
Computer, 1997
Trace processors: moving to fourth-generation microarchitectures
Computer, 1997
Superspeculative microarchitecture for beyond AD 2000
Computer, 1997
Parallel programming with Polaris
Computer, 1996