A performance counter architecture for computing accurate CPI components
Top Cited Papers
- 20 October 2006
- conference paper
- Published by Association for Computing Machinery (ACM)
- Vol. 40 (5) , 175-184
- https://doi.org/10.1145/1168857.1168880
Abstract
A common way of representing processor performance is to use Cycles per Instruction (CPI) 'stacks' which break performance into a baseline CPI plus a number of individual miss event CPI compo- nents. CPI stacks can be very helpful in gaining insight into the be- havior of an application on a given microprocessor; consequently, they are widely used by software application developers and com- puter architects. However, computing CPI stacks on superscalar out-of-order processors is challenging because of various overlaps among execution and miss events (cache misses, TLB misses, and branch mispredictions). This paper shows that meaningful and accurate CPI stacks can be computed for superscalar out-of-order processors. Using interval analysis, a novel method for analyzing out-of-order processor per- formance, we gain understanding into the performance impact of the various miss events. Based on this understanding, we propose a novel way of architecting hardware performance counters for build- ing accurate CPI stacks. The additional hardware for implementing these counters is limited and comparable to existing hardware per- formance counter architectures while being signicantly more ac- curate than previous approaches.Keywords
This publication has 9 references indexed in Scilit:
- Characterizing the branch misprediction penaltyPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2006
- Interaction cost and shotgun profilingACM Transactions on Architecture and Code Optimization, 2004
- Benchmarking internet servers on superscalar machinesComputer, 2003
- Pentium 4 performance-monitoring featuresIEEE Micro, 2002
- Performance of database workloads on shared-memory systems with out-of-order processorsPublished by Association for Computing Machinery (ACM) ,1998
- Continuous profilingACM Transactions on Computer Systems, 1997
- Performance analysis using the MIPS R10000 performance countersPublished by Association for Computing Machinery (ACM) ,1996
- Theoretical modeling of superscalar processor performancePublished by Association for Computing Machinery (ACM) ,1994
- The Inhibition of Potential Parallelism by Conditional JumpsIEEE Transactions on Computers, 1972