Quantitative analysis of vector code

1 January 1995

conference paper
Published by Institute of Electrical and Electronics Engineers (IEEE)

p. 452-461
https://doi.org/10.1109/empdp.1995.389176

Abstract

In this paper we present the results of a detailed simulation study of the execution of vector programs on a single processor of a Convex C3480 machine, using a subset of the Perfect Club benchmarks. We are interested in evaluating several cost/performance tradeoffs that the machine designers made in order to assess which features of the architecture severely limit the performance attainable. We present the detailed usage of the vector functional units and a study of the kinds of resource conflicts that stall the machine. The results obtained show that the resources of the vector architecture are not efficiently used mainly due to the single bus memory architecture. Other severe limitations of the machine turn out to be the lack of chaining between vector loads and vector computations, and the lack of a second general purpose functional unit. We also present some data about the port pressure on the vector register file and we see that stalls due to port conflicts are relatively high. We also consider the slow-down introduced by spill code and find that the limited number of vector registers also limits performance.

Keywords

This publication has 9 references indexed in Scilit:

On The Instruction-level Characteristics Of Scalar Code In Highly-vectorized Scientific Applications
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2005
Pathlengths of SPEC benchmarks for PA-RISC, MIPS, and SPARC
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2002
Limits of instruction-level parallelism
Published by Association for Computing Machinery (ACM) ,1991
The Perfect Club Benchmarks: Effective Performance Evaluation of Supercomputers
The International Journal of Supercomputing Applications, 1989
Available instruction-level parallelism for superscalar and superpipelined machines
Published by Association for Computing Machinery (ACM) ,1989
Limits on multiple instruction issue
Published by Association for Computing Machinery (ACM) ,1989
The nonuniform distribution of instruction-level and machine parallelism and its effect on performance
IEEE Transactions on Computers, 1989
Multipipeline networking for compound vector processing
IEEE Transactions on Computers, 1988
The CRAY-1 computer system
Communications of the ACM, 1978