Efficient Breadth-First Search on the Cell/BE Processor
- 29 August 2008
- journal article
- Published by Institute of Electrical and Electronics Engineers (IEEE) in IEEE Transactions on Parallel and Distributed Systems
- Vol. 19 (10) , 1381-1395
- https://doi.org/10.1109/tpds.2007.70811
Abstract
Multi-core processors are a shift of paradigm in computer architecture that promises a dramatic increase in performance. But they also bring an unprecedented level of complexity in algorithmic design and software development. In this paper we describe the challenges involved in designing a breadth-first search (BFS) algorithm for the Cell/B.E. processor. The proposed methodology combines a high-level algorithmic design that captures the machine-independent aspects, to guarantee portability with performance to future processors, with an implementation that embeds processor-specific optimizations. Using a fine-grained global coordination strategy derived by the bulk-synchronous parallel (BSP) model, we have determined an accurate performance model that has guided the implementation and the optimization of our algorithm. Our experiments on a pre-production Cell/B.E. board running at 3.2 GHz, show almost linear speedups when using multiple synergistic processing elements, and an impressive level of performance when compared to other processors. On graphs which offer sufficient parallelism, the Cell/B.E. is typically an order of magnitude faster than conventional processors, such as the AMD Opteron and the Intel Pentium 4 and Woodcrest, and custom-designed architectures, such as the MTA-2 and BlueGene/L.Keywords
This publication has 42 references indexed in Scilit:
- Implementation of mixed precision in solving systems of linear equations on the Cell processorConcurrency and Computation: Practice and Experience, 2007
- The Problem with ThreadsComputer, 2006
- Petascale computational systemsComputer, 2006
- Community detection in complex networks using extremal optimizationPhysical Review E, 2005
- The GeForce 6800IEEE Micro, 2005
- Niagara: A 32-Way Multithreaded Sparc ProcessorIEEE Micro, 2005
- Montecito: A Dual-Core, Dual-Thread Itanium ProcessorIEEE Micro, 2005
- Horus: Large-Scale Symmetric Multiprocessing for Opteron SystemsIEEE Micro, 2005
- Finding community structure in very large networksPhysical Review E, 2004
- Algorithms for scalable synchronization on shared-memory multiprocessorsACM Transactions on Computer Systems, 1991