Scalable hardware memory disambiguation for high ILP processors
- 25 May 2004
- conference paper
- Published by Institute of Electrical and Electronics Engineers (IEEE)
Abstract
This paper describes several methods for improving the scalability of memory disambiguation hardware for future high ILP processors. As the number of in-flight instructions grows with issue width and pipeline depth, the load/store queues (LSQ) threaten to become a bottleneck in both power and latency. By employing lightweight approximate hashing in hardware with structures called Bloom filters, many improvements to the LSQ are possible. We propose two types of filtering schemes using Bloom filters: search filtering, which uses hashing to reduce both the number of lookups to the LSQ and the number of entries that must be searched, and state filtering, in which the number of entries kept in the LSQs is reduced by coupling address predictors and Bloom filters, permitting smaller queues. We evaluate these techniques for LSQs indexed by both instruction age and the instruction's effective address, and for both centralized and physically partitioned LSQs. We show that search filtering avoids up to 98% of the associative LSQ searches, providing significant power savings and keeping LSQ searches to under one high-frequency clock cycle. We also show that with state filtering, the load queue can be eliminated altogether with only minor reductions n performance for small instruction window machines.Keywords
This publication has 18 references indexed in Scilit:
- A design space evaluation of grid processor architecturesPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2005
- Itanium 2 processor microarchitectureIEEE Micro, 2003
- Bloom filtering cache misses for accurate data speculation and prefetchingPublished by Association for Computing Machinery (ACM) ,2002
- On the importance of points-to analysis and other memory disambiguation methods for C programsPublished by Association for Computing Machinery (ACM) ,2001
- Measuring Experimental Error in Microprocessor SimulationPublished by Association for Computing Machinery (ACM) ,2001
- Summary cache: a scalable wide-area Web cache sharing protocolIEEE/ACM Transactions on Networking, 2000
- Introducing the IA-64 architectureIEEE Micro, 2000
- ARB: a hardware mechanism for dynamic reordering of memory referencesIEEE Transactions on Computers, 1996
- Dynamic memory disambiguation using the memory conflict bufferPublished by Association for Computing Machinery (ACM) ,1994
- Space/time trade-offs in hash coding with allowable errorsCommunications of the ACM, 1970