VEAL: Virtualized Execution Accelerator for Loops
- 1 June 2008
- conference paper
- Published by Institute of Electrical and Electronics Engineers (IEEE)
- Vol. 36 (3) , 389-400
- https://doi.org/10.1109/isca.2008.33
Abstract
Performance improvement solely through transistor scaling is becoming more and more difficult, thus it is increasingly common to see domain specific accelerators used in conjunction with general purpose processors to achieve future performance goals. There is a serious drawback to accelerators, though: binary compatibility. An application compiled to utilize an accelerator cannot run on a processor without that accelerator, and applications that do not utilize an accelerator will never use it. To overcome this problem, we propose decoupling the instruction set architecture from the underlying accelerators. Computation to be accelerated is expressed using a processorpsilas baseline instruction set, and light-weight dynamic translation maps the representation to whatever accelerators are available in the system. In this paper, we describe the changes to a compilation framework and processor system needed to support this abstraction for an important set of accelerator designs that support innermost loops. In this analysis, we investigate the dynamic overheads associated with abstraction as well as the static/dynamic tradeoffs to improve the dynamic mapping of loop-nests. As part of the exploration, we also provide a quantitative analysis of the hardware characteristics of effective loop accelerators. We conclude that using a hybrid static-dynamic compilation approach to map computation on to loop-level accelerators is an practical way to increase computation efficiency, without the overheads associated with instruction set modification.Keywords
This publication has 19 references indexed in Scilit:
- Single-dimension software pipelining for multidimensional loopsACM Transactions on Architecture and Code Optimization, 2007
- Reducing Startup Time in Co-Designed Virtual MachinesACM SIGARCH Computer Architecture News, 2006
- A loop accelerator for low power embedded VLIW processorsPublished by Association for Computing Machinery (ACM) ,2004
- A comparative study of modulo scheduling techniquesPublished by Association for Computing Machinery (ACM) ,2002
- Cycle-time aware architecture synthesis of custom hardware acceleratorsPublished by Association for Computing Machinery (ACM) ,2002
- Dynamic binary translation and optimizationIEEE Transactions on Computers, 2001
- ShiftQPublished by Association for Computing Machinery (ACM) ,2001
- DynamoPublished by Association for Computing Machinery (ACM) ,2000
- Exploiting instruction level parallelism in processors by caching scheduled groupsACM SIGARCH Computer Architecture News, 1997
- Swing module scheduling: a lifetime-sensitive approachPublished by Institute of Electrical and Electronics Engineers (IEEE) ,1996