MPI versus MPI+OpenMP on the IBM SP for the NAS Benchmarks
- 1 January 2000
- proceedings article
- Published by Institute of Electrical and Electronics Engineers (IEEE)
Abstract
The hybrid memory model of clusters of multiprocessors raises two issues: programming model and performance. Many parallel programs have been written by using the MPI standard. To evaluate the pertinence of hybrid models for existing MPI codes, we compare a unified model (MPI) and a hybrid one (OpenMP fine grain parallelization after profiling) for the NAS 2.3 benchmarks on two IBM SP systems. The superiority of one model depends on 1) the level of shared memory model parallelization, 2) the communication patterns and 3) the memory access patterns. The relative speeds of the main architecture components (CPU, memory, and network) are of tremendous importance for selecting one model. With the used hybrid model, our results show that a unified MPI approach is better for most of the benchmarks. The hybrid approach becomes better only when fast processors make the communication performance significant and the level of parallelization is sufficient.Keywords
This publication has 9 references indexed in Scilit:
- Fine-grain software distributed shared memory on SMP clustersPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2002
- Home-based SVM protocols for SMP clusters: Design and performancePublished by Institute of Electrical and Electronics Engineers (IEEE) ,2002
- Investigating the performance of two programming models for clusters of SMP PCsPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2002
- Performance evaluation of the IBM SP and the Compaq AlphaServer SCPublished by Association for Computing Machinery (ACM) ,2000
- Dual-Level Parallel Analysis of Harbor Wave Response Using MPI and OpenMPThe International Journal of High Performance Computing Applications, 2000
- A comparison of MPI, SHMEM and cache-coherent shared address space programming models on the SGI Origin2000Published by Association for Computing Machinery (ACM) ,1999
- Minimizing Communication Penalty of Triangular Solvers by Runtime Mesh Configuration and Workload RedistributionThe Journal of Supercomputing, 1999
- Cashmere-2LPublished by Association for Computing Machinery (ACM) ,1997
- SoftFLASHPublished by Association for Computing Machinery (ACM) ,1996