OpenMP to GPGPU
Top Cited Papers
- 14 February 2009
- proceedings article
- Published by Association for Computing Machinery (ACM)
- Vol. 44 (4) , 101-110
- https://doi.org/10.1145/1504176.1504194
Abstract
GPGPUs have recently emerged as powerful vehicles for general-purpose high-performance computing. Although a new Compute Unified Device Architecture (CUDA) programming model from NVIDIA offers improved programmability for general computing, programming GPGPUs is still complex and error-prone. This paper presents a compiler framework for automatic source-to-source translation of standard OpenMP applications into CUDA-based GPGPU applications. The goal of this translation is to further improve programmability and make existing OpenMP applications amenable to execution on GPGPUs. In this paper, we have identified several key transformation techniques, which enable efficient GPU global memory access, to achieve high performance. Experimental results from two important kernels (JACOBI and SPMUL) and two NAS OpenMP Parallel Benchmarks (EP and CG) show that the described translator and compile-time optimizations work well on both regular and irregular applications, leading to performance improvements of up to 50X over the unoptimized translation (up to 328X over serial)Keywords
This publication has 15 references indexed in Scilit:
- Loading OpenMP to Cell: An Effective Compiler Framework for Heterogeneous Multi-core ChipPublished by Springer Nature ,2008
- Optimizing irregular shared-memory applications for clustersPublished by Association for Computing Machinery (ACM) ,2008
- A compiler framework for optimization of affine loop nests for gpgpusPublished by Association for Computing Machinery (ACM) ,2008
- Nebelung: Execution Environment for Transactional OpenMPInternational Journal of Parallel Programming, 2008
- Program optimization space pruning for a multithreaded gpuPublished by Association for Computing Machinery (ACM) ,2008
- Optimization principles and application performance evaluation of a multithreaded GPU using CUDAPublished by Association for Computing Machinery (ACM) ,2008
- MCUDA: An Efficient Implementation of CUDA Kernels for Multi-core CPUsPublished by Springer Nature ,2008
- CUDA-Lite: Reducing GPU Programming ComplexityPublished by Springer Nature ,2008
- An integrated simdization framework using virtual vectorsPublished by Association for Computing Machinery (ACM) ,2005
- Automatic translation of FORTRAN programs to vector formACM Transactions on Programming Languages and Systems, 1987