Out-of-order vector architectures

1 January 1997

proceedings article
Published by Institute of Electrical and Electronics Engineers (IEEE)

p. 160-170
https://doi.org/10.1109/micro.1997.645807

Abstract

Register renaming and out-of-order instruction issue are now commonly used in superscalar processors. These techniques can also be used to significant advantage in vector processors, as this paper shows. Performance is improved and available memory bandwidth is used more effectively. Using a trace driven simulation we compare a conventional vector implementation, based on the Convex C3400, with an out-of-order, register renaming, vector implementation. When the number of physical registers is above 12, out-of-order execution coupled with register renaming provides a speedup of 1.24-1.72 for realistic memory latencies. Out-of-order techniques also tolerate main memory latencies of 100 cycles with a performance degradation less than 6%. The mechanisms used for register renaming and out-of-order issue can be used to support precise interrupts-generally a difficult problem in vector machines. When precise interrupts are implemented, there is typically less than a 10% degradation in performance. A new technique based on register renaming is targeted at dynamically eliminating spill code; this technique is shown to provide an extra speedup ranging between 1.10 and 1.20 while reducing total memory traffic by an average of 15-20%.Peer ReviewedPostprint (published version

Keywords

This publication has 11 references indexed in Scilit:

CRegs: a new kind of memory for referencing arrays and pointers
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2003
Architecture of the VPP500 parallel supercomputer
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2002
The Mips R10000 superscalar microprocessor
IEEE Micro, 1996
Quantitative analysis of vector code
Published by Institute of Electrical and Electronics Engineers (IEEE) ,1995
Relationship between average and real memory behavior
The Journal of Supercomputing, 1994
Distributed storage control unit for the Hitachi S-3800 multivector supercomputer
Published by Association for Computing Machinery (ACM) ,1994
Explaining the Gap between Theoretical Peak Performance and Real Performance for Supercomputer Architectures
Scientific Programming, 1994
The performance impact of vector processor cashes
Published by Institute of Electrical and Electronics Engineers (IEEE) ,1992
THE PARALLEL PROCESSING FEATURE OF THE NEC SX-3 SUPERCOMPUTER SYSTEM
International Journal of High Speed Computing, 1991
The CRAY-1 computer system
Communications of the ACM, 1978