Architecture of Parallel Computers CSC / ECE 506 Summer 2006 Scalable Programming Models Lecture 11 6/19/2006 Dr Steve Hunter Back to Basics Parallel Architecture = Computer Architecture + Communication Architecture Small-scale shared memory extend the memory system to support multiple processors good for multiprogramming throughput and parallel computing allows fine-grain sharing of resources Naming & synchronization communication is implicit in store/load of shared address synchronization is performed by operations on shared addresses Latency & Bandwidth utilize the normal migration within the storage to avoid long latency operations and to reduce bandwidth economical medium with fundamental BW limit Arch of Parallel Computers CSC / ECE 506 2
Realizing Programming Models CAD Database Multi-programming Shared address Scientific modeling Message passing Parallel applications Data parallel Programming models Compilation or library Operating systems support Communication abstraction User/system boundary Hardware/software boundary
Communication hardware Physical communication medium Pn P1 Conceptual Picture Memory Arch of Parallel Computers CSC / ECE 506 3 Key Property Large number of independent communication paths between nodes Allow a large number of concurrent transactions using different wires Initiated independently No global arbitration Effect of a transaction only visible to the nodes involved
Effects propagated through additional transactions Arch of Parallel Computers CSC / ECE 506 4 Network Transaction Primitive Communication network Serialized data packet Input buffer Output buffer Destination node Source node One-way transfer of information from a source output buffer to a destination input buffer causes some action at the destination occurrence is not directly visible at source Deposit data, state change, reply Arch of Parallel Computers
CSC / ECE 506 5 Bus Transactions vs Net Transactions Issues: protection check format output buffering media arbitration destination naming and routing input buffering action completion detection Arch of Parallel Computers V->P wires reg, FIFO
global ?? flexible ?? local limited many source CSC / ECE 506 6 Shared Address Space Abstraction Source (1) Initiate memory access Destination Load r Global address] (2) Address translation (3) Local /remote check (4) Request transaction Read request Read request
(5) Remote memory access Wait Memory access Read response (6) Reply transaction Read response (7) Complete memory access Time fixed format, request/response, simple action Arch of Parallel Computers CSC / ECE 506 7 Consistency is challenging while (flag==0); print A; A=1; flag=1; P2 P1
Congested path CSC / ECE 506 8 Synchronous Message Passing Source Destination Recv Psrc, local VA, len (1) Initiate send (2) Address translation on P src Send Pdest, local VA, len (3) Local/remote check Send-rdy req (4) Send-ready request (5) Remote check for posted receive (assume success) Wait
Tag check (6) Reply transaction Recv-rdy reply (7) Bulk data transfer Source VA Dest VA or ID Data-xfer req Time Arch of Parallel Computers CSC / ECE 506 9 Asynchronous Message Passing: Optimistic Destination Source (1) Initiate send (2) Address translation Send (Pdest, local VA, len) (3) Local /remote check (4) Send data (5) Remote check for
posted receive; on fail, allocate data buffer Tag match Data-xfer req Recv P src, local VA, len Time Arch of Parallel Computers Allocate buffer CSC / ECE 506 10 Asynchronous Message Passing: Conservative Destination Source (1) Initiate send (2) Address translation on P dest Send Pdest, local VA, len
(3) Local /remote check Send-rdy req (4) Send-ready request (5) Remote check for posted receive (assume fail); record send-ready Return and compute Tag check (6) Receive-ready request Recv Psrc, local VA, len (7) Bulk data reply Source VA Dest VA or ID Recv-rdy req Data-xfer reply Time Arch of Parallel Computers CSC / ECE 506
11 Active Messages Request handler Reply handler User-level analog of network transaction Action is small user function Request/Reply May also perform memory-to-memory transfer Arch of Parallel Computers CSC / ECE 506 12 Common Challenges
Input buffer overflow N-1 queue over-commitment => must slow sources reserve space per source (credit) when available for reuse? Ack or Higher level Refuse input when full backpressure in reliable network tree saturation deadlock free what happens to traffic not bound for congested dest? Reserve ack back channel drop packets Arch of Parallel Computers CSC / ECE 506 13 Challenges (cont) Fetch Deadlock For network to remain deadlock free, nodes must continue accepting messages, even when cannot source them what if incoming transaction is a request? Each may generate a response, which cannot be sent! What happens when internal buffering is full? logically independent request/reply networks
physical networks virtual channels with separate input/output queues bound requests and reserve input buffer space K(P-1) requests + K responses per node service discipline to avoid fetch deadlock? NACK on input buffer full NACK delivery? Arch of Parallel Computers CSC / ECE 506 14 Summary Scalability physical, bandwidth, latency and cost level of integration Realizing Programming Models network transactions protocols safety N-1 fetch deadlock Next: Communication Architecture Design Space
How much hardware interpretation of the network transaction? Arch of Parallel Computers CSC / ECE 506 15 The End Arch of Parallel Computers CSC / ECE 506 16
Beyond the minimum PFR, up to 60% of total FRS can come from Load Resources on UFR or FFR. Generation. Online or offline capacity that can be converted to energy within 10 minutes. Dispatched by SCED. Load Resources (UFR not...
A perspective drawing shows a view like a picture taken with a camera There are three main types of perspective drawings depending on how many vanishing points are used. These are called one-point, two-point, and three-point perspectives. ... Pictorial Drawings...
Il primo ministro è il leader politico dell'esecutivo e capo del governo. ... Il leader della CDU/CSU (Helmut Kohl) fu nominato formatore perché controllava il partito più grande. Ovviamente, Helmut Kohl non andrà a formare un governo che non comprende...
Side-by-Side Execution - CLR kann parallel mehrere Versionen einer Assembly laden Global Assembly Cache Zentraler Speicherort aller öffentlichen Assemblies Vorübersetzte Assemblies für die aktuelle Plattform werden in NativeImages_.. abgelegt CLR prüft zuerst in NativeImages Vorübersetzung mit Ngen.exe (Native Image Generator)...
Public Information Session Proposed Extension of the Acton Quarry May 5, 2009 Purpose of the Evening Provide an overview of Dufferin's extension application for the Acton Quarry Listen to your concerns to ensure Dufferin understands all the issues from your...