Characterization of Branch and Data Dependencies in Programs for Evaluating Pipeline Performance
- 1 July 1987
- journal article
- Published by Institute of Electrical and Electronics Engineers (IEEE) in IEEE Transactions on Computers
- Vol. C-36 (7) , 859-875
- https://doi.org/10.1109/tc.1987.1676981
Abstract
The nature by which branches and data dependencies generate delays that degrade pipeline performance is investigated in this paper. We show that for the general execution trace, few specific delays can be considered in isolation; rather, the magnitude of any specific delay may depend on the relative proximity of other delays. This phenomenon can make the task of accurately characterizing a trace tape with simple statistics intractable. We present a set of trace reductions that facilitates this task by simplifying the corresponding data-dependency graph. The reductions operate on multiple data-dependency arcs and branches in conjunction; those arcs whose performance implications are redundant with respect to the dependency graph are identified, and eliminated from the graph. We show that the reduced graph can be accurately characterized by simple statistics. We use these statistics to show that as the length of a pipeline increases, the performance degradation due to data dependencies and branches increases monotonically. However, lengthening the pipeline may correspond to decreasing the cycle time of the pipeline. These two opposing effects are used in conjunction to derive an equation for optimal pipeline length for a given trace tape. The optimal pipeline length is shown to be characterized by n = √γα where γ is the ratio of overall circuit delay to latching overhead, and a is a function of the trace statistics that accounts for the delays induced by data dependencies and branches.Keywords
This publication has 21 references indexed in Scilit:
- Optimal pipelining in supercomputersACM SIGARCH Computer Architecture News, 1986
- Dynamic Characteristics of LoopsIEEE Transactions on Computers, 1984
- Dynamic Profile of Instruction Sequences for the IBM System/370IEEE Transactions on Computers, 1983
- Measurement and analysis of instruction use in the VAX-11/780ACM SIGARCH Computer Architecture News, 1982
- The Amdahl 470V/8 and the IBM 3033: A Comparison of Processor DesignsComputer, 1982
- Computer system design using a hierarchical approach to performance evaluationCommunications of the ACM, 1980
- Performance evaluation of highly concurrent computers by deterministic simulationCommunications of the ACM, 1978
- Sequentiality and prefetching in database systemsACM Transactions on Database Systems, 1978
- Exploring a Stack ArchitectureComputer, 1977
- An instruction timing model of CPU performanceACM SIGARCH Computer Architecture News, 1977