Concurrency, latency, or system overhead: Which has the largest impact on uniprocessor DRAM-system performance?

13 November 2002

conference paper
Published by Institute of Electrical and Electronics Engineers (IEEE)

No. 10636897,p. 62-71
https://doi.org/10.1109/isca.2001.937433

Abstract

Given a fixed CPU architecture and a fixed DRAM timing specification, there is still a large design space for a DRAM system organization. Parameters include the number of memory channels, the bandwidth of each channel, burst sizes, queue sizes and organizations, turnaround overhead, memory-controller page protocol, algorithms for assigning request priorities and scheduling requests dynamically, etc. In this design space, we see a wide variation in application execution times; for example, execution times for SPEC CPU 2000 integer suite on a 2-way ganged direct rambles organization (32 data bits) with 64-byte bursts are 10-20% lower than execution times on an otherwise identical configuration that uses 32-byte bursts. This represents two system configurations that are relatively close to each other in the design space; performance differences become even more pronounced for designs further apart. This paper characterizes the sources of overhead in high-performance DRAM systems and investigates the most effective ways to reduce a system's exposure to performance loss. In particular, we look at mechanisms to increase a system's support for concurrent transactions, mechanisms to reduce request latency, and mechanisms to reduce the "system overhead"-the portion of the primary memory system's overhead that is not due to DRAM latency but rather to things like turnaround time, request queueing inefficiencies due to read/write request interleaving, etc. Our simulator models a 2 GHz, highly aggressive out-of-order uniprocessor. The interface to the memory system is fully non-blocking, supporting up to 32 outstanding misses at both the level-1 and level-2 caches and split-transaction busses to all DRAM banks.

Keywords

This publication has 9 references indexed in Scilit:

A performance comparison of contemporary DRAM architectures
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2003
Increasing TLB reach using superpages backed by shadow memory
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2002
Access ordering and memory-conscious cache utilization
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2002
The New DRAM Interfaces: SDRAM, RDRAM and Variants
Published by Springer Nature ,2000
Impulse: building a smarter memory controller
Published by Institute of Electrical and Electronics Engineers (IEEE) ,1999
Access order and effective bandwidth for streams on a Direct Rambus memory
Published by Institute of Electrical and Electronics Engineers (IEEE) ,1999
Operating system benchmarking in the wake of lmbench
Published by Association for Computing Machinery (ACM) ,1997
The SimpleScalar tool set, version 2.0
ACM SIGARCH Computer Architecture News, 1997
Design and evaluation of dynamic access ordering hardware
Published by Association for Computing Machinery (ACM) ,1996