Data remapping for design space optimization of embedded memory systems
- 1 May 2003
- journal article
- Published by Association for Computing Machinery (ACM) in ACM Transactions on Embedded Computing Systems
- Vol. 2 (2) , 186-218
- https://doi.org/10.1145/643470.643474
Abstract
In this article, we present a novel linear time algorithm for data remapping , that is, (i) lightweight; (ii) fully automated; and (iii) applicable in the context of pointer-centric programming languages with dynamic memory allocation support. All previous work in this area lacks one or more of these features. We proceed to demonstrate a novel application of this algorithm as a key step in optimizing the design of an embedded memory system. Specifically, we show that by virtue of locality enhancements via data remapping, we may reduce the memory subsystem needs of an application by 50%, and hence concomitantly reduce the associated costs in terms of size, power, and dollar-investment (61%). Such a reduction overcomes key hurdles in designing high-performance embedded computing solutions. Namely, memory subsystems are very desirable from a performance standpoint, but their costs have often limited their use in embedded systems. Thus, our innovative approach offers the intriguing possibility of compilers playing a significant role in exploring and optimizing the design space of a memory subsystem for an embedded design. To this end and in order to properly leverage the improvements afforded by a compiler optimization, we identify a range of measures for quantifying the cost-impact of popular notions of locality, prefetching, regularity of memory access, and others . The proposed methodology will become increasingly important, especially as the needs for application specific embedded architectures become prevalent. In addition, we demonstrate the wide applicability of data remapping using several existing microprocessors, such as the Pentium and UltraSparc. Namely, we show that remapping can achieve a performance improvement of 20% on the average. Similarly, for a parametric research HPL-PD microprocessor, which characterizes the new Itanium machines, we achieve a performance improvement of 28% on average. All of our results are achieved using applications from the DIS, Olden and SPEC2000 suites of integer and floating point benchmarks.Keywords
This publication has 19 references indexed in Scilit:
- The future of the microprocessor businessIEEE Spectrum, 2002
- Design space optimization of embedded memory systems via data remappingPublished by Association for Computing Machinery (ACM) ,2002
- Dynamic hot data stream prefetching for general-purpose programsPublished by Association for Computing Machinery (ACM) ,2002
- Data and memory optimization techniques for embedded systemsACM Transactions on Design Automation of Electronic Systems, 2001
- Automated data-member layout of heap objects to improve memory-hierarchy performanceACM Transactions on Programming Languages and Systems, 2000
- Memory data organization for improved cache performance in embedded processor applicationsACM Transactions on Design Automation of Electronic Systems, 1997
- Improving data locality with loop transformationsACM Transactions on Programming Languages and Systems, 1996
- The superblock: An effective technique for VLIW and superscalar compilationThe Journal of Supercomputing, 1993
- The cache performance and optimizations of blocked algorithmsPublished by Association for Computing Machinery (ACM) ,1991
- Mapping nested loop algorithms into multidimensional systolic arraysIEEE Transactions on Parallel and Distributed Systems, 1990