Coherent network interfaces for fine-grain communication
- 1 May 1996
- journal article
- Published by Association for Computing Machinery (ACM) in ACM SIGARCH Computer Architecture News
- Vol. 24 (2) , 247-258
- https://doi.org/10.1145/232974.232999
Abstract
Historically, processor accesses to memory-mapped device registers have been marked uncachable to insure their visibility to the device. The ubiquity of snooping cache coherence, however, makes it possible for processors and devices to interact with cachable, coherent memory operations. Using coherence can improve performance by facilitating burst transfers of whole cache blocks and reducing control overheads (e.g., for polling).This paper begins an exploration of network interfaces (NIs) that use coherence---coherent network interfaces (CNIs)---to improve communication performance. We restrict this study to NI/CNIs that reside on coherent memory or I/O buses, to NI/CNIs that are much simpler than processors, and to the performance of fine-grain messaging from user process to user process.Our first contribution is to develop and optimize two mechanisms that CNIs use to communicate with processors. A cachable device register---derived from cachable control registers [39,40]---is a coherent, cachable block of memory used to transfer status, control, or data between a device and a processor. Cachable queues generalize cachable device registers from one cachable, coherent memory block to a contiguous region of cachable, coherent blocks managed as a circular queue.Our second contribution is a taxonomy and comparison of four CNIs with a more conventional NI. Microbenchmark results show that CNIs can improve the round-trip latency and achievable bandwidth of a small 64-byte message by 37% and 125% respectively on the memory bus and 74% and 123% respectively on a coherent I/O bus. Experiments with five macrobenchmarks show that CNIs can improve the performance by 17-53% on the memory bus and 30-88% on the I/O bus.Keywords
This publication has 20 references indexed in Scilit:
- Efficient support for irregular applications on distributed-memory machinesPublished by Association for Computing Machinery (ACM) ,1995
- Boosting the performance of hybrid snooping cache protocolsPublished by Association for Computing Machinery (ACM) ,1995
- Cost-effective parallel computingComputer, 1995
- Where is time spent in message-passing and shared-memory programs?Published by Association for Computing Machinery (ACM) ,1994
- Anatomy of a message in the Alewife multiprocessorPublished by Association for Computing Machinery (ACM) ,1993
- Parallel programming in Split-CPublished by Association for Computing Machinery (ACM) ,1993
- A tightly-coupled processor-network interfacePublished by Association for Computing Machinery (ACM) ,1992
- The network architecture of the Connection Machine CM-5 (extended abstract)Published by Association for Computing Machinery (ACM) ,1992
- Algorithms for scalable synchronization on shared-memory multiprocessorsACM Transactions on Computer Systems, 1991
- Supporting systolic and memory communication in iWarpPublished by Association for Computing Machinery (ACM) ,1990