Exploring The Benefits Of Multiple Hardware Contexts In A Multiprocessor Architecture: Preliminary Results
- 24 August 2005
- conference paper
- Published by Institute of Electrical and Electronics Engineers (IEEE)
- p. 273-280
- https://doi.org/10.1109/isca.1989.714562
Abstract
A fundamental problem that any scalable multiprocessor must address is the ability to tolerate high latency memory operations. This paper explores the extent to which multiple hardware contexts per processor can help to mitigate the negative effects of high latency. In particular, we evaluate the performance of a directory-based cache coherent multiprocessor using memory reference traces obtained from three parallel applications. We explore the case where there are a small fixed number (2-4) of hardware contexts per processor and the context switch overhead is low. In contrast to previously proposed approaches, we also use a very simple context switch criterion, namely a cache miss or a write-hit to shared data. Our results show that the effectiveness of multiple contexts depends on the nature of the applications, the context switch overhead, and the inherent latency of the machine architecture. Given reasonably low overhead hardware context switches, we show that two or four contexts can achieve substantial performance gains over a single context. For one application, the processor utilization increased by about 46% with two contexts and by about 80% with four contexts.Keywords
This publication has 14 references indexed in Scilit:
- LocusRoute: a parallel global router for standard cellsPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2003
- Toward a dataflow/von Neumann hybrid architecturePublished by Institute of Electrical and Electronics Engineers (IEEE) ,2003
- MASA: a multithreaded processor architecture for parallel symbolic computingPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2003
- Measurement and evaluation of the MIPS architecture and processorACM Transactions on Computer Systems, 1988
- Vectorization of a particle simulation method for hypersonic rarefied flowPublished by American Institute of Aeronautics and Astronautics (AIAA) ,1988
- Parallel implementation of OPS5 on the encore multiprocessor: Results and analysisInternational Journal of Parallel Programming, 1988
- The parallel decomposition and implementation of an integrated circuit global routerPublished by Association for Computing Machinery (ACM) ,1988
- The architecture and programming of the Ametek series 2010 multicomputerPublished by Association for Computing Machinery (ACM) ,1988
- Architecture of a message-driven processorPublished by Association for Computing Machinery (ACM) ,1987
- Reduced instruction set computersCommunications of the ACM, 1985