MPI versus MPI+OpenMP on the IBM SP for the NAS Benchmarks

1 January 2000

proceedings article
Published by Institute of Electrical and Electronics Engineers (IEEE)

Abstract

The hybrid memory model of clusters of multiprocessors raises two issues: programming model and performance. Many parallel programs have been written by using the MPI standard. To evaluate the pertinence of hybrid models for existing MPI codes, we compare a unified model (MPI) and a hybrid one (OpenMP fine grain parallelization after profiling) for the NAS 2.3 benchmarks on two IBM SP systems. The superiority of one model depends on 1) the level of shared memory model parallelization, 2) the communication patterns and 3) the memory access patterns. The relative speeds of the main architecture components (CPU, memory, and network) are of tremendous importance for selecting one model. With the used hybrid model, our results show that a unified MPI approach is better for most of the benchmarks. The hybrid approach becomes better only when fast processors make the communication performance significant and the level of parallelization is sufficient.

Keywords

This publication has 9 references indexed in Scilit:

Fine-grain software distributed shared memory on SMP clusters
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2002
Home-based SVM protocols for SMP clusters: Design and performance
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2002
Investigating the performance of two programming models for clusters of SMP PCs
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2002
Performance evaluation of the IBM SP and the Compaq AlphaServer SC
Published by Association for Computing Machinery (ACM) ,2000
Dual-Level Parallel Analysis of Harbor Wave Response Using MPI and OpenMP
The International Journal of High Performance Computing Applications, 2000
A comparison of MPI, SHMEM and cache-coherent shared address space programming models on the SGI Origin2000
Published by Association for Computing Machinery (ACM) ,1999
Minimizing Communication Penalty of Triangular Solvers by Runtime Mesh Configuration and Workload Redistribution
The Journal of Supercomputing, 1999
Cashmere-2L
Published by Association for Computing Machinery (ACM) ,1997
SoftFLASH
Published by Association for Computing Machinery (ACM) ,1996