Tarantula
- 1 May 2002
- journal article
- Published by Association for Computing Machinery (ACM) in ACM SIGARCH Computer Architecture News
- Vol. 30 (2) , 281-292
- https://doi.org/10.1145/545214.545247
Abstract
Tarantula is an aggressive floating point machine targeted at technical, scientific and bioinformatics workloads, originally planned as a follow-on candidate to the EV8 processor [6, 5]. Tarantula adds to the EV8 core a vector unit capable of 32 double-precision flops per cycle. The vector unit fetches data directly from a 16 MByte second level cache with a peak bandwidth of sixty four 64-bit values per cycle. The whole chip is backed by a memory controller capable of delivering over 64 GBytes/s of raw band- width. Tarantula extends the Alpha ISA with new vector instructions that operate on new architectural state. Salient features of the architecture and implementation are: (1) it fully integrates into a virtual-memory cache-coherent system without changes to its coherency protocol, (2) provides high bandwidth for non-unit stride memory accesses, (3) supports gather/scatter instructions efficiently, (4) fully integrates with the EV8 core with a narrow, streamlined interface, rather than acting as a co-processor, (5) can achieve a peak of 104 operations per cycle, and (6) achieves excellent "real-computation" per transistor and per watt ratios. Our detailed simulations show that Tarantula achieves an average speedup of 5X over EV8, out of a peak speedup in terms of flops of 8X. Furthermore, performance on gather/scatter intensive benchmarks such as Radix Sort is also remarkable: a speedup of almost 3X over EV8 and 15 sustained operations per cycle. Several benchmarks exceed 20 operations per cycle.Keywords
This publication has 9 references indexed in Scilit:
- Asim: a performance model frameworkComputer, 2002
- New tiling techniques to improve cache temporal localityPublished by Association for Computing Machinery (ACM) ,1999
- Adding a vector unit to a superscalar processorPublished by Association for Computing Machinery (ACM) ,1999
- Exploiting instruction- and data-level parallelismIEEE Micro, 1997
- Exploiting choicePublished by Association for Computing Machinery (ACM) ,1996
- Spert-II: a vector microprocessor systemComputer, 1996
- Conflict-free access for streams in multimodule memoriesIEEE Transactions on Computers, 1995
- Simultaneous multithreadingPublished by Association for Computing Machinery (ACM) ,1995
- Increasing the number of strides for conflict-free vector accessPublished by Association for Computing Machinery (ACM) ,1992