Characterization of Branch and Data Dependencies in Programs for Evaluating Pipeline Performance

1 July 1987

journal article
Published by Institute of Electrical and Electronics Engineers (IEEE) in IEEE Transactions on Computers

Vol. C-36 (7) , 859-875
https://doi.org/10.1109/tc.1987.1676981

Abstract

The nature by which branches and data dependencies generate delays that degrade pipeline performance is investigated in this paper. We show that for the general execution trace, few specific delays can be considered in isolation; rather, the magnitude of any specific delay may depend on the relative proximity of other delays. This phenomenon can make the task of accurately characterizing a trace tape with simple statistics intractable. We present a set of trace reductions that facilitates this task by simplifying the corresponding data-dependency graph. The reductions operate on multiple data-dependency arcs and branches in conjunction; those arcs whose performance implications are redundant with respect to the dependency graph are identified, and eliminated from the graph. We show that the reduced graph can be accurately characterized by simple statistics. We use these statistics to show that as the length of a pipeline increases, the performance degradation due to data dependencies and branches increases monotonically. However, lengthening the pipeline may correspond to decreasing the cycle time of the pipeline. These two opposing effects are used in conjunction to derive an equation for optimal pipeline length for a given trace tape. The optimal pipeline length is shown to be characterized by n = √γα where γ is the ratio of overall circuit delay to latching overhead, and a is a function of the trace statistics that accounts for the delays induced by data dependencies and branches.

Keywords

This publication has 21 references indexed in Scilit:

Optimal pipelining in supercomputers
ACM SIGARCH Computer Architecture News, 1986
Dynamic Characteristics of Loops
IEEE Transactions on Computers, 1984
Dynamic Profile of Instruction Sequences for the IBM System/370
IEEE Transactions on Computers, 1983
Measurement and analysis of instruction use in the VAX-11/780
ACM SIGARCH Computer Architecture News, 1982
The Amdahl 470V/8 and the IBM 3033: A Comparison of Processor Designs
Computer, 1982
Computer system design using a hierarchical approach to performance evaluation
Communications of the ACM, 1980
Performance evaluation of highly concurrent computers by deterministic simulation
Communications of the ACM, 1978
Sequentiality and prefetching in database systems
ACM Transactions on Database Systems, 1978
Exploring a Stack Architecture
Computer, 1977
An instruction timing model of CPU performance
ACM SIGARCH Computer Architecture News, 1977