A cross-input adaptive framework for GPU program optimizations

Abstract

Recent years have seen a trend in using graphic processing units (GPU) as accelerators for general-purpose computing. The inexpensive, single-chip, massively parallel architecture of GPU has evidentially brought factors of speedup to many numerical applications. However, the development of a high-quality GPU application is challenging, due to the large optimization space and complex unpredictable effects of optimizations on GPU program performance. Recently, several studies have attempted to use empirical search to help the optimization. Although those studies have shown promising results, one important factor-program inputs-in the optimization has remained unexplored. In this work, we initiate the exploration in this new dimension. By conducting a series of measurement, we find that the ability to adapt to program inputs is important for some applications to achieve their best performance on GPU. In light of the findings, we develop an input-adaptive optimization framework, namely G-ADAPT, to address the influence by constructing cross-input predictive models for automatically predicting the (near-)optimal configurations for an arbitrary input to a GPU program. The results demonstrate the promise of the framework in serving as a tool to alleviate the productivity bottleneck in GPU programming.

Keywords

This publication has 20 references indexed in Scilit:

Efficient computation of sum-products on GPUs through software-managed cache
Published by Association for Computing Machinery (ACM) ,2008
Fast scan algorithms on graphics processors
Published by Association for Computing Machinery (ACM) ,2008
Program optimization space pruning for a multithreaded gpu
Published by Association for Computing Machinery (ACM) ,2008
Faster matrix-vector multiplication on GeForce 8800GTX
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2008
Scalable Parallel Programming with CUDA
Queue, 2008
Optimization principles and application performance evaluation of a multithreaded GPU using CUDA
Published by Association for Computing Machinery (ACM) ,2008
Sparsity: Optimization Framework for Sparse Matrix Kernels
The International Journal of High Performance Computing Applications, 2004
Online feedback-directed optimization of Java
Published by Association for Computing Machinery (ACM) ,2002
Automated empirical optimizations of software and the ATLAS project
Parallel Computing, 2001
Optimizing matrix multiply using PHiPAC
Published by Association for Computing Machinery (ACM) ,1997