On the Architectural Requirements for Efficient Execution of Graph Algorithms
- 3 August 2005
- conference paper
- Published by Institute of Electrical and Electronics Engineers (IEEE)
- p. 547-556
- https://doi.org/10.1109/icpp.2005.55
Abstract
Combinatorial problems such as those from graph theory pose serious challenges for parallel machines due to non-contiguous, concurrent accesses to global data structures with low degrees of locality. The hierarchical memory systems of symmetric multiprocessor (SMP) clusters optimize for local, contiguous memory accesses, and so are inefficient platforms for such algorithms. Few parallel graph algorithms outperform their best sequential implementation on SMP clusters due to long memory latencies and high synchronization costs. In this paper, we consider the performance and scalability of two graph algorithms, list ranking and connected components, on two classes of shared-memory computers: symmetric multiprocessors such as the Sun Enterprise servers and multithreaded architectures (MTA) such as the Cray MTA-2. While previous studies have shown that parallel graph algorithms can speedup on SMPs, the systems' reliance on cache microprocessors limits performance. The MTA's latency tolerant processors and hardware support for fine-grain synchronization makes performance a function of parallelism. Since parallel graph algorithms have an abundance of parallelism, they perform and scale significantly better on the MTA. We describe and give a performance model for each architecture. We analyze the performance of the two algorithms and discuss how the features of each architecture affects algorithm development, ease of programming, performance, and scalability.Keywords
This publication has 26 references indexed in Scilit:
- Concurrent threads and optimal parallel minimum spanning trees algorithmJournal of the ACM, 2001
- Prefix Computations on Symmetric MultiprocessorsJournal of Parallel and Distributed Computing, 2001
- An Optimal Randomised Logarithmic Time Connectivity Algorithm for the EREW PRAMJournal of Computer and System Sciences, 1996
- Finding Connected Components in O(log n log log n) Time on the EREW PRAMJournal of Algorithms, 1995
- An efficient and fast parallel-connected component algorithmJournal of the ACM, 1990
- Efficient parallel algorithms for graph problemsAlgorithmica, 1990
- Faster optimal parallel prefix sums and list rankingInformation and Computation, 1989
- Efficient parallel algorithms for some graph problemsCommunications of the ACM, 1982
- An O(logn) parallel connectivity algorithmJournal of Algorithms, 1982
- Computing connected components on parallel computersCommunications of the ACM, 1979