Data remapping for design space optimization of embedded memory systems

1 May 2003

journal article
Published by Association for Computing Machinery (ACM) in ACM Transactions on Embedded Computing Systems

Vol. 2 (2) , 186-218
https://doi.org/10.1145/643470.643474

Abstract

In this article, we present a novel linear time algorithm for data remapping , that is, (i) lightweight; (ii) fully automated; and (iii) applicable in the context of pointer-centric programming languages with dynamic memory allocation support. All previous work in this area lacks one or more of these features. We proceed to demonstrate a novel application of this algorithm as a key step in optimizing the design of an embedded memory system. Specifically, we show that by virtue of locality enhancements via data remapping, we may reduce the memory subsystem needs of an application by 50%, and hence concomitantly reduce the associated costs in terms of size, power, and dollar-investment (61%). Such a reduction overcomes key hurdles in designing high-performance embedded computing solutions. Namely, memory subsystems are very desirable from a performance standpoint, but their costs have often limited their use in embedded systems. Thus, our innovative approach offers the intriguing possibility of compilers playing a significant role in exploring and optimizing the design space of a memory subsystem for an embedded design. To this end and in order to properly leverage the improvements afforded by a compiler optimization, we identify a range of measures for quantifying the cost-impact of popular notions of locality, prefetching, regularity of memory access, and others . The proposed methodology will become increasingly important, especially as the needs for application specific embedded architectures become prevalent. In addition, we demonstrate the wide applicability of data remapping using several existing microprocessors, such as the Pentium and UltraSparc. Namely, we show that remapping can achieve a performance improvement of 20% on the average. Similarly, for a parametric research HPL-PD microprocessor, which characterizes the new Itanium machines, we achieve a performance improvement of 28% on average. All of our results are achieved using applications from the DIS, Olden and SPEC2000 suites of integer and floating point benchmarks.

Keywords

This publication has 19 references indexed in Scilit:

The future of the microprocessor business
IEEE Spectrum, 2002
Design space optimization of embedded memory systems via data remapping
Published by Association for Computing Machinery (ACM) ,2002
Dynamic hot data stream prefetching for general-purpose programs
Published by Association for Computing Machinery (ACM) ,2002
Data and memory optimization techniques for embedded systems
ACM Transactions on Design Automation of Electronic Systems, 2001
Automated data-member layout of heap objects to improve memory-hierarchy performance
ACM Transactions on Programming Languages and Systems, 2000
Memory data organization for improved cache performance in embedded processor applications
ACM Transactions on Design Automation of Electronic Systems, 1997
Improving data locality with loop transformations
ACM Transactions on Programming Languages and Systems, 1996
The superblock: An effective technique for VLIW and superscalar compilation
The Journal of Supercomputing, 1993
The cache performance and optimizations of blocked algorithms
Published by Association for Computing Machinery (ACM) ,1991
Mapping nested loop algorithms into multidimensional systolic arrays
IEEE Transactions on Parallel and Distributed Systems, 1990