Compiler-decided dynamic memory allocation for scratch-pad based embedded systems
- 30 October 2003
- proceedings article
- Published by Association for Computing Machinery (ACM)
- p. 276-286
- https://doi.org/10.1145/951710.951747
Abstract
This paper presents a highly predictable, low overhead and yet dynamic, memory allocation strategy for embedded systems with scratch-pad memory. A scratch-pad is a fast compiler-managed SRAM memory that replaces the hardware-managed cache. It is motivated by its better real-time guarantees vs cache and by its significantly lower overheads in energy consumption, area and overall runtime, even with a simple allocation scheme [4].Existing scratch-pad allocation methods are of two types. First, software-caching schemes emulate the workings of a hardware cache in software. Instructions are inserted before each load/store to check the software-maintained cache tags. Such methods incur large overheads in runtime, code size, energy consumption and SRAM space for tags and deliver poor real-time guarantees just like hardware caches. A second category of algorithms partitionsm variables at compile-time into the two banks. For example, our previous work in [3] derives a provably optimal static allocation for global and stack variables and achieves a speedup over all earlier methods. However, a drawback of such static allocation schemes is that they do not account for dynamic program behavior. It is easy to see why a data allocation that never changes at runtime cannot achieve the full locality benefits of a cache.In this paper we present a dynamic allocation method for global and stack data that for the first time, (i) accounts for changing program requirements at runtime (ii) has no software-caching tags (iii) requires no run-time checks (iv) has extremely low overheads, and (v) yields 100% predictable memory access times. In this method data that is about to be accessed frequently is copied into the SRAM using compiler-inserted code at fixed and infrequent points in the program. Earlier data is evicted if necessary. When compared to a provably optimal static allocation our results show runtime reductions ranging from 11% to 38%, averaging 31.2%, using no additional hardware support. With hardware support for pseudo-DMA and full DMA, which is already provided in some commercial systems, the runtime reductions increase to 33.4% and 34.2% respectively.Keywords
This publication has 11 references indexed in Scilit:
- An optimal memory allocation scheme for scratch-pad-based embedded systemsACM Transactions on Embedded Computing Systems, 2002
- Scratchpad memoryPublished by Association for Computing Machinery (ACM) ,2002
- Heterogeneous memory management for embedded systemsPublished by Association for Computing Machinery (ACM) ,2001
- Storage allocation for embedded processorsPublished by Association for Computing Machinery (ACM) ,2001
- Dynamic management of scratch-pad memory spacePublished by Association for Computing Machinery (ACM) ,2001
- A fully associative software-managed cache designPublished by Association for Computing Machinery (ACM) ,2000
- Power analysis and minimization techniques for embedded DSP softwareIEEE Transactions on Very Large Scale Integration (VLSI) Systems, 1997
- Software Caching and Computation Migration in OldenJournal of Parallel and Distributed Computing, 1996
- Fine-grain access control for distributed shared memoryPublished by Association for Computing Machinery (ACM) ,1994
- A study of replacement algorithms for a virtual-storage computerIBM Systems Journal, 1966