The Stanford FLASH multiprocessor

Abstract
The FLASH multiprocessor,efficiently,integrates,support,for cache-coherent,shared,memory,and,high-performance,message passing, while minimizing both hardware and software overhead. Each node in FLASH contains a micropromssor, a portion of the machine’s globat memory, a port to the interconnection network an I/O interface, and a custom node controller called MAGIC. The MAGIC chip handles all communication,both within the node and among nodes, using hsrdwired data paths for efficient data movement,and a programmable,processor optimized for executing,protocol,operations. lhe,use of the protocol,processor makes,FLASH very,flexible — it can,support,a variety,of differ- ent communication,mechanisms,— and simplifies the design,and implementation. This paper presents the architecture of FLASH and MAGIC, and,discusses,the base,cache-coherence,and,message-passing protocols. Latency and occupancy numbers, which are derived from our system-level simulator and our Verilog code, are given for,severrd,common,protocol,operations.,The,paper,also describes,our software,strategy and FLASH’s current,status.

This publication has 10 references indexed in Scilit: