A cross-input adaptive framework for GPU program optimizations
- 1 May 2009
- conference paper
- Published by Institute of Electrical and Electronics Engineers (IEEE)
Abstract
Recent years have seen a trend in using graphic processing units (GPU) as accelerators for general-purpose computing. The inexpensive, single-chip, massively parallel architecture of GPU has evidentially brought factors of speedup to many numerical applications. However, the development of a high-quality GPU application is challenging, due to the large optimization space and complex unpredictable effects of optimizations on GPU program performance. Recently, several studies have attempted to use empirical search to help the optimization. Although those studies have shown promising results, one important factor-program inputs-in the optimization has remained unexplored. In this work, we initiate the exploration in this new dimension. By conducting a series of measurement, we find that the ability to adapt to program inputs is important for some applications to achieve their best performance on GPU. In light of the findings, we develop an input-adaptive optimization framework, namely G-ADAPT, to address the influence by constructing cross-input predictive models for automatically predicting the (near-)optimal configurations for an arbitrary input to a GPU program. The results demonstrate the promise of the framework in serving as a tool to alleviate the productivity bottleneck in GPU programming.Keywords
This publication has 20 references indexed in Scilit:
- Efficient computation of sum-products on GPUs through software-managed cachePublished by Association for Computing Machinery (ACM) ,2008
- Fast scan algorithms on graphics processorsPublished by Association for Computing Machinery (ACM) ,2008
- Program optimization space pruning for a multithreaded gpuPublished by Association for Computing Machinery (ACM) ,2008
- Faster matrix-vector multiplication on GeForce 8800GTXPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2008
- Scalable Parallel Programming with CUDAQueue, 2008
- Optimization principles and application performance evaluation of a multithreaded GPU using CUDAPublished by Association for Computing Machinery (ACM) ,2008
- Sparsity: Optimization Framework for Sparse Matrix KernelsThe International Journal of High Performance Computing Applications, 2004
- Online feedback-directed optimization of JavaPublished by Association for Computing Machinery (ACM) ,2002
- Automated empirical optimizations of software and the ATLAS projectParallel Computing, 2001
- Optimizing matrix multiply using PHiPACPublished by Association for Computing Machinery (ACM) ,1997