Parallel execution of radix sort program using fine-grain communication
- 23 November 2002
- conference paper
- Published by Institute of Electrical and Electronics Engineers (IEEE)
- p. 136-145
- https://doi.org/10.1109/pact.1997.644010
Abstract
The report presents empirical results of fine-grain communication on the 80-processor EM-X distributed-memory multiprocessor. EM-X has hardware support for low latency, high throughput fine-grain communication-this hardware support includes packet generation integrated into the instruction execution pipeline for single-cycle communication overhead, direct memory access for remote references, and rapid context switching for latency tolerance. The authors study the fine-grain communication performance of integer radix sort, a code with irregular communication, on EM-X, and compare it to the Fujitsu AP1000+ and the Cray Server CS6400. The experimental results indicate that EM-X achieves high throughput and low overhead for fine-grain communication. Whereas EM-X's communication performance scales perfectly as one increases the number of processors, other coarse-grain message-passing machines exhibit fluctuation and performance degradation for larger configurations due to network contention.Keywords
This publication has 13 references indexed in Scilit:
- The EM-X parallel computerPublished by Association for Computing Machinery (ACM) ,1995
- AP1000+Published by Association for Computing Machinery (ACM) ,1994
- TAM - A Compiler Controlled Threaded Abstract MachineJournal of Parallel and Distributed Computing, 1993
- Design and implementation of a circular omega network in the EM-4Parallel Computing, 1993
- Thread-based programming for the EM-4 hybrid dataflow machinePublished by Association for Computing Machinery (ACM) ,1992
- Algorithms for scalable synchronization on shared-memory multiprocessorsACM Transactions on Computer Systems, 1991
- A comparison of sorting algorithms for the connection machine CM-2Published by Association for Computing Machinery (ACM) ,1991
- I-structures: data structures for parallel computingACM Transactions on Programming Languages and Systems, 1989
- An architecture of a dataflow single chip processorPublished by Association for Computing Machinery (ACM) ,1989
- Data parallel algorithmsCommunications of the ACM, 1986