Continual flow pipelines
- 7 October 2004
- conference paper
- Published by Association for Computing Machinery (ACM)
- Vol. 32 (5) , 107-119
- https://doi.org/10.1145/1024393.1024407
Abstract
Increased integration in the form of multiple processor cores on a single die, relatively constant die sizes, shrinking power envelopes, and emerging applications create a new challenge for processor architects. How to build a processor that provides high single-thread performance and enables multiple of these to be placed on the same die for high throughput while dynamically adapting for future applications? Conventional approaches for high single-thread performance rely on large and complex cores to sustain a large instruction window for memory tolerance, making them unsuitable for multi-core chips. We present (CFP) as a new non-blocking processor pipeline architecture that achieves the performance of a large instruction window without requiring cycle-critical structures such as the scheduler and register file to be large. We show that to achieve benefits of a large instruction window, inefficiencies in management of both the scheduler and register file must be addressed, and we propose a unified solution. The non-blocking property of CFP keeps key processor structures affecting cycle time and power (scheduler, register file), and die size (second level cache) small. The memory latency-tolerant CFP core allows multiple cores on a single die while outperforming current processor cores for single-thread applications.Keywords
This publication has 8 references indexed in Scilit:
- Out-of-Order Commit ProcessorsPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2004
- Exploiting ILP, TLP, and DLP with the polymorphous TRIPS architecturePublished by Association for Computing Machinery (ACM) ,2003
- Execution-based prediction using speculative slicesPublished by Association for Computing Machinery (ACM) ,2001
- Dynamically allocating processor resources between nearby and distant ILPPublished by Association for Computing Machinery (ACM) ,2001
- Multiple-banked register file architecturesPublished by Association for Computing Machinery (ACM) ,2000
- DataScalar architecturesPublished by Association for Computing Machinery (ACM) ,1997
- Improving data cache performance by pre-executing instructions under a cache missPublished by Association for Computing Machinery (ACM) ,1997
- Multiscalar processorsPublished by Association for Computing Machinery (ACM) ,1995