Concurrency, latency, or system overhead: Which has the largest impact on uniprocessor DRAM-system performance?
- 13 November 2002
- conference paper
- Published by Institute of Electrical and Electronics Engineers (IEEE)
- No. 10636897,p. 62-71
- https://doi.org/10.1109/isca.2001.937433
Abstract
Given a fixed CPU architecture and a fixed DRAM timing specification, there is still a large design space for a DRAM system organization. Parameters include the number of memory channels, the bandwidth of each channel, burst sizes, queue sizes and organizations, turnaround overhead, memory-controller page protocol, algorithms for assigning request priorities and scheduling requests dynamically, etc. In this design space, we see a wide variation in application execution times; for example, execution times for SPEC CPU 2000 integer suite on a 2-way ganged direct rambles organization (32 data bits) with 64-byte bursts are 10-20% lower than execution times on an otherwise identical configuration that uses 32-byte bursts. This represents two system configurations that are relatively close to each other in the design space; performance differences become even more pronounced for designs further apart. This paper characterizes the sources of overhead in high-performance DRAM systems and investigates the most effective ways to reduce a system's exposure to performance loss. In particular, we look at mechanisms to increase a system's support for concurrent transactions, mechanisms to reduce request latency, and mechanisms to reduce the "system overhead"-the portion of the primary memory system's overhead that is not due to DRAM latency but rather to things like turnaround time, request queueing inefficiencies due to read/write request interleaving, etc. Our simulator models a 2 GHz, highly aggressive out-of-order uniprocessor. The interface to the memory system is fully non-blocking, supporting up to 32 outstanding misses at both the level-1 and level-2 caches and split-transaction busses to all DRAM banks.Keywords
This publication has 9 references indexed in Scilit:
- A performance comparison of contemporary DRAM architecturesPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2003
- Increasing TLB reach using superpages backed by shadow memoryPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2002
- Access ordering and memory-conscious cache utilizationPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2002
- The New DRAM Interfaces: SDRAM, RDRAM and VariantsPublished by Springer Nature ,2000
- Impulse: building a smarter memory controllerPublished by Institute of Electrical and Electronics Engineers (IEEE) ,1999
- Access order and effective bandwidth for streams on a Direct Rambus memoryPublished by Institute of Electrical and Electronics Engineers (IEEE) ,1999
- Operating system benchmarking in the wake of lmbenchPublished by Association for Computing Machinery (ACM) ,1997
- The SimpleScalar tool set, version 2.0ACM SIGARCH Computer Architecture News, 1997
- Design and evaluation of dynamic access ordering hardwarePublished by Association for Computing Machinery (ACM) ,1996