Clock rate versus IPC

1 January 2000

proceedings article
Published by Association for Computing Machinery (ACM)

Vol. 28 (2) , 248-259
https://doi.org/10.1145/339647.339691

Abstract

The doubling of microprocessor performanceevery three years has been the result of two factors: more transistors per chip and superlinear scaling of the processor clock with technology generation. Our results show that, due to both diminishing improvements in clock rates and poor wire scal- ing as semiconductordevices shrink, the achievable performance growth of conventional microarchitectures will slow substantially. In this paper, we describe technology-driven models for wire capacitance, wire delay, and microarchitectural component delay. Using the results of these models, we measure the simulated performance—estimatingboth clock rate and IPC— of an aggressive out-of-order microarchitecture as it is scaled from a 250nm technology to a 35nm technology. We perform this analysis for three clock scaling targets and two microarchitecture scaling strategies: pipeline scal- ing and capacity scaling. We find that no scaling strategy permits annual performance improvements of better than 12.5%, which is far worse than the annual 50-60% to which we have grown accustomed. For the past decade, microprocessors have been improving in over- all performance at a rate of approximately 50-60% per year. These substantial performance improvements have been mined from two sources. First, designers have been increasing clock rates at a rapid rate, both by scaling technology and by reducing the number of levels of logic per cycle. Second, designers have been exploiting the increasing number of transistors on a chip, plus improvements in compiler technology, to improve instruction throughput (IPC). Although designers have generally opted to emphasize one over the other, both clock rates and IPC have been improving consis- tently. In Figure 1, we show that while some designers have cho- sen to optimize the design for fast clocks (Compaq Alpha), and others have optimized their design for high instruction throughput (HP PA-RISC), the past decade's performance increases have been a function of both. Achieving high performance in future microprocessors will be a tremendous challenge, as both components of performance im- provement are facing emerging technology-driven limitations. De- signers will soon be unable to sustain clock speed improvements at the past decade's annualized rate of 50% per year. We find that the rate of clock speed improvement must soon drop to scaling linearly with minimum gate length, between 12% and 17% per year. Compensating for the slower clock growth by increasing sus- tained IPC proportionally will be difficult. Wire delays will limit the ability of conventional microarchitectures to improve instruc- tion throughput. Microprocessor cores will soon face a new con- straint, one in which they are communication bound on the die in- stead of capacity bound. As feature sizes shrink, and wires become slower relative to logic, the amount of state that can be accessed in a single clock cycle will cease to grow, and will eventually begin to decline. Increases in instruction-level parallelism will be limited by the amount of state reachable in a cycle, not by the number of transistors that can be manufactured on a chip. For conventional microarchitectures implemented in future tech- nologies, our results show that, as wire delays grow relative to gate delays, improvements in clock rate and IPC become directly antag- onistic. This fact limits the performance achievable by any conven- tional microarchitecture. In such a world, designers are faced with a difficult choice: increase the clock rate aggressively at the cost of reducing IPC, or mitigate the decline in IPC by slowing the rate of clock speed growth. In this paper, we explore the scalability of microprocessor cores as technology shrinks from the current 250nm feature sizes to the projected 35nm in 2014. With detailed wire and component models, we show that today' s designs scale poorly with technology, improv- ing at best 12.5% per year over the next fourteen years. We show that designers must select among deeper pipelines, smaller struc- tures, or slower clocks, and that none of these choices, nor the best combination, will result in scalable performance. Whether design- ers choose an aggressive clock and lower IPC, or a slower clock and a higher IPC, today' s designs cannot sustain the performance improvements of the past decades. In Section 2, we describe trends in transistor switching and wire transmission time, as well as our analytical wire delay model. The delay model is derived from capacitance extracted from a 3D field solver, using technology parameters from the Semiconductor Indus- try Association (SIA) technology roadmap (22). We use the model to estimate microarchitectural wiring delays in future technologies. In Section 3, we describe our microarchitecture component mod- els, which are based on the Cacti cache delay analysis tool (30). These models calculate access delay as a function of cache param- eters and technology generation. We model most of the major com- ponents of a microprocessor core, such as caches, register files, and queues. We show that the inherent trade-off between access time and capacity will force designers to limit or even decrease the size of the structures to meet clock rate expectations. For example, our models show that in a 35nm implementation with a 10GHz clock, accessing even a 4KB level-one cache will require 3 clock cycles. In Section 4, we report experimental results that show how the projected scaling of microarchitectural components affects over- all performance. Using the results of our analytical models as in- puts to SimpleScalar-based timing simulation, we track the perfor- mance of current microarchitectures when scaled from 250nm to 35nm technology, using different approaches for scaling the clock

Keywords

This publication has 0 references indexed in Scilit: