Impact of heterogeneity on DSM performance
- 7 November 2002
- conference paper
- Published by Institute of Electrical and Electronics Engineers (IEEE)
Abstract
This paper explores area/parallelism tradeoffs in the design of distributed shared-memory (DSM) multiprocessors built out of large single-chip computing nodes. In this context, area-efficiency arguments motivate a heterogeneous organization consisting of few nodes with large caches designed for single-thread parallelism, and a larger number of nodes with smaller caches designed for multi-thread parallelism. Quantitative performance of such organization is reported for a set of homogeneous multiprocessor programs from the SPLASH-2 benchmark suite. These programs are mapped onto the heterogeneous processors without source code modifications via static thread assignment policies. Simulation-based analysis is used to compare the performance of heterogeneous and homogeneous DSMs that occupy the same silicon area. The analysis shows that a 4-node heterogeneous DSM with 21 processors outperforms its homogeneous counterpart with 4 processors by an average age of 36% for the studied multiprocessor workload, while having the same performance for sequential codes. A sensitivity analysis based on a factorial design experiment is used to study the implications of processor, memory, and network heterogeneity on overall cost and performance of a heterogeneous DSM. The studied benchmarks are affected, on average, primarily by heterogeneity in processor performance (59.3%), followed by cache sizes (18.2%), memory latency (14.6%), and network latency (5.6%).Keywords
This publication has 20 references indexed in Scilit:
- An argument for simple COMAPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2002
- Wisconsin Wind Tunnel II: a fast, portable parallel architecture simulatorIEEE Concurrency, 2000
- Data speculation support for a chip multiprocessorPublished by Association for Computing Machinery (ACM) ,1998
- Simultaneous multithreading: a platform for next-generation processorsIEEE Micro, 1997
- Baring it all to software: Raw machinesComputer, 1997
- One billion transistors, one uniprocessor, one chipComputer, 1997
- Scalable processors in the billion-transistor era: IRAMComputer, 1997
- Trace processors: moving to fourth-generation microarchitecturesComputer, 1997
- Superspeculative microarchitecture for beyond AD 2000Computer, 1997
- Parallel programming with PolarisComputer, 1996