OpenMP to GPGPU

Top Cited Papers

14 February 2009

proceedings article
Published by Association for Computing Machinery (ACM)

Vol. 44 (4) , 101-110
https://doi.org/10.1145/1504176.1504194

Abstract

GPGPUs have recently emerged as powerful vehicles for general-purpose high-performance computing. Although a new Compute Unified Device Architecture (CUDA) programming model from NVIDIA offers improved programmability for general computing, programming GPGPUs is still complex and error-prone. This paper presents a compiler framework for automatic source-to-source translation of standard OpenMP applications into CUDA-based GPGPU applications. The goal of this translation is to further improve programmability and make existing OpenMP applications amenable to execution on GPGPUs. In this paper, we have identified several key transformation techniques, which enable efficient GPU global memory access, to achieve high performance. Experimental results from two important kernels (JACOBI and SPMUL) and two NAS OpenMP Parallel Benchmarks (EP and CG) show that the described translator and compile-time optimizations work well on both regular and irregular applications, leading to performance improvements of up to 50X over the unoptimized translation (up to 328X over serial)

Keywords

This publication has 15 references indexed in Scilit:

Loading OpenMP to Cell: An Effective Compiler Framework for Heterogeneous Multi-core Chip
Published by Springer Nature ,2008
Optimizing irregular shared-memory applications for clusters
Published by Association for Computing Machinery (ACM) ,2008
A compiler framework for optimization of affine loop nests for gpgpus
Published by Association for Computing Machinery (ACM) ,2008
Nebelung: Execution Environment for Transactional OpenMP
International Journal of Parallel Programming, 2008
Program optimization space pruning for a multithreaded gpu
Published by Association for Computing Machinery (ACM) ,2008
Optimization principles and application performance evaluation of a multithreaded GPU using CUDA
Published by Association for Computing Machinery (ACM) ,2008
MCUDA: An Efficient Implementation of CUDA Kernels for Multi-core CPUs
Published by Springer Nature ,2008
CUDA-Lite: Reducing GPU Programming Complexity
Published by Springer Nature ,2008
An integrated simdization framework using virtual vectors
Published by Association for Computing Machinery (ACM) ,2005
Automatic translation of FORTRAN programs to vector form
ACM Transactions on Programming Languages and Systems, 1987