Using network interface support to avoid asynchronous protocol processing in shared virtual memory systems
- 20 January 2003
- conference paper
- Published by Institute of Electrical and Electronics Engineers (IEEE)
- p. 282-293
- https://doi.org/10.1109/isca.1999.765958
Abstract
The performance of page-based software shared virtual memory (SVM) is still far from that achieved on hardware-coherent distributed shared memory (DSM) systems. The interrupt cost for asynchronous protocol processing has been found to be a key source of performance loss and complexity. This paper shows that by providing simple and general support for asynchronous message handling in a commodity network interface (NI), and by altering SVM protocols appropriately, protocol activity can be decoupled from asynchronous message handling and the need for interrupts or polling can be eliminated. The NI mechanisms needed are generic, not SVM-dependent. They also require neither visibility into the node memory system nor code instrumentation to identify memory operations. We prototype the mechanisms and such a synchronous home-based LRC protocol, called GeNIMA (GEneral-purpose Network Interface support in a shared Memory Abstraction), on a cluster of SMPs with a programmable NI, though the mechanisms are simple and do not require programmability. We find that the performance improvements are substantial, bringing performance on a small-scale SMP cluster much closer to that of hardware-coherent shared memory for many applications, and we show the value of each of the mechanisms in different applications. Application performance improves by about 37% on average for reasonably well performing applications, even on our relatively slow programmable NI, and more for others. We discuss the key remaining bottlenecks at the protocol level and use a firmware performance monitor in the NI to understand the interactions with and the implications for the communication layer.Keywords
This publication has 38 references indexed in Scilit:
- Scaling application performance on a cache-coherent multiprocessorACM SIGARCH Computer Architecture News, 1999
- The Virtual Interface ArchitectureIEEE Micro, 1998
- Telegraphos: A Substrate for High-Performance Computing on Workstation ClustersJournal of Parallel and Distributed Computing, 1997
- Application and architectural bottlenecks in large scale distributed shared memory machinesACM SIGARCH Computer Architecture News, 1996
- Implications of hierarchical N-body methods for multiprocessor architecturesACM Transactions on Computer Systems, 1995
- Myrinet: a gigabit-per-second local area networkIEEE Micro, 1995
- Memory coherence in shared virtual memory systemsACM Transactions on Computer Systems, 1989
- Hierarchical N-body methodsComputer Physics Communications, 1988
- A hierarchical O(N log N) force-calculation algorithmNature, 1986
- Multi-Level Adaptive Solutions to Boundary-Value ProblemsMathematics of Computation, 1977