Continual flow pipelines

7 October 2004

conference paper
Published by Association for Computing Machinery (ACM)

Vol. 32 (5) , 107-119
https://doi.org/10.1145/1024393.1024407

Abstract

Increased integration in the form of multiple processor cores on a single die, relatively constant die sizes, shrinking power envelopes, and emerging applications create a new challenge for processor architects. How to build a processor that provides high single-thread performance and enables multiple of these to be placed on the same die for high throughput while dynamically adapting for future applications? Conventional approaches for high single-thread performance rely on large and complex cores to sustain a large instruction window for memory tolerance, making them unsuitable for multi-core chips. We present (CFP) as a new non-blocking processor pipeline architecture that achieves the performance of a large instruction window without requiring cycle-critical structures such as the scheduler and register file to be large. We show that to achieve benefits of a large instruction window, inefficiencies in management of both the scheduler and register file must be addressed, and we propose a unified solution. The non-blocking property of CFP keeps key processor structures affecting cycle time and power (scheduler, register file), and die size (second level cache) small. The memory latency-tolerant CFP core allows multiple cores on a single die while outperforming current processor cores for single-thread applications.

Keywords

This publication has 8 references indexed in Scilit:

Out-of-Order Commit Processors
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2004
Exploiting ILP, TLP, and DLP with the polymorphous TRIPS architecture
Published by Association for Computing Machinery (ACM) ,2003
Execution-based prediction using speculative slices
Published by Association for Computing Machinery (ACM) ,2001
Dynamically allocating processor resources between nearby and distant ILP
Published by Association for Computing Machinery (ACM) ,2001
Multiple-banked register file architectures
Published by Association for Computing Machinery (ACM) ,2000
DataScalar architectures
Published by Association for Computing Machinery (ACM) ,1997
Improving data cache performance by pre-executing instructions under a cache miss
Published by Association for Computing Machinery (ACM) ,1997
Multiscalar processors
Published by Association for Computing Machinery (ACM) ,1995