Lecture 11: Scalable Programming Models

Lecture 11: Scalable Programming Models

Architecture of Parallel Computers CSC / ECE 506 Summer 2006 Scalable Programming Models Lecture 11 6/19/2006 Dr Steve Hunter Back to Basics Parallel Architecture = Computer Architecture + Communication Architecture Small-scale shared memory extend the memory system to support multiple processors good for multiprogramming throughput and parallel computing allows fine-grain sharing of resources Naming & synchronization communication is implicit in store/load of shared address synchronization is performed by operations on shared addresses Latency & Bandwidth utilize the normal migration within the storage to avoid long latency operations and to reduce bandwidth economical medium with fundamental BW limit Arch of Parallel Computers CSC / ECE 506 2

Realizing Programming Models CAD Database Multi-programming Shared address Scientific modeling Message passing Parallel applications Data parallel Programming models Compilation or library Operating systems support Communication abstraction User/system boundary Hardware/software boundary

Communication hardware Physical communication medium Pn P1 Conceptual Picture Memory Arch of Parallel Computers CSC / ECE 506 3 Key Property Large number of independent communication paths between nodes Allow a large number of concurrent transactions using different wires Initiated independently No global arbitration Effect of a transaction only visible to the nodes involved

Effects propagated through additional transactions Arch of Parallel Computers CSC / ECE 506 4 Network Transaction Primitive Communication network Serialized data packet Input buffer Output buffer Destination node Source node One-way transfer of information from a source output buffer to a destination input buffer causes some action at the destination occurrence is not directly visible at source Deposit data, state change, reply Arch of Parallel Computers

CSC / ECE 506 5 Bus Transactions vs Net Transactions Issues: protection check format output buffering media arbitration destination naming and routing input buffering action completion detection Arch of Parallel Computers V->P wires reg, FIFO

global ?? flexible ?? local limited many source CSC / ECE 506 6 Shared Address Space Abstraction Source (1) Initiate memory access Destination Load r Global address] (2) Address translation (3) Local /remote check (4) Request transaction Read request Read request

(5) Remote memory access Wait Memory access Read response (6) Reply transaction Read response (7) Complete memory access Time fixed format, request/response, simple action Arch of Parallel Computers CSC / ECE 506 7 Consistency is challenging while (flag==0); print A; A=1; flag=1; P2 P1

Memory P3 Memory Memory A:0 flag:0->1 Delay 3: load A 1: A=1 2: flag=1 Interconnection network (a) P2 P1 (b) Arch of Parallel Computers P3

Congested path CSC / ECE 506 8 Synchronous Message Passing Source Destination Recv Psrc, local VA, len (1) Initiate send (2) Address translation on P src Send Pdest, local VA, len (3) Local/remote check Send-rdy req (4) Send-ready request (5) Remote check for posted receive (assume success) Wait

Tag check (6) Reply transaction Recv-rdy reply (7) Bulk data transfer Source VA Dest VA or ID Data-xfer req Time Arch of Parallel Computers CSC / ECE 506 9 Asynchronous Message Passing: Optimistic Destination Source (1) Initiate send (2) Address translation Send (Pdest, local VA, len) (3) Local /remote check (4) Send data (5) Remote check for

posted receive; on fail, allocate data buffer Tag match Data-xfer req Recv P src, local VA, len Time Arch of Parallel Computers Allocate buffer CSC / ECE 506 10 Asynchronous Message Passing: Conservative Destination Source (1) Initiate send (2) Address translation on P dest Send Pdest, local VA, len

(3) Local /remote check Send-rdy req (4) Send-ready request (5) Remote check for posted receive (assume fail); record send-ready Return and compute Tag check (6) Receive-ready request Recv Psrc, local VA, len (7) Bulk data reply Source VA Dest VA or ID Recv-rdy req Data-xfer reply Time Arch of Parallel Computers CSC / ECE 506

11 Active Messages Request handler Reply handler User-level analog of network transaction Action is small user function Request/Reply May also perform memory-to-memory transfer Arch of Parallel Computers CSC / ECE 506 12 Common Challenges

Input buffer overflow N-1 queue over-commitment => must slow sources reserve space per source (credit) when available for reuse? Ack or Higher level Refuse input when full backpressure in reliable network tree saturation deadlock free what happens to traffic not bound for congested dest? Reserve ack back channel drop packets Arch of Parallel Computers CSC / ECE 506 13 Challenges (cont) Fetch Deadlock For network to remain deadlock free, nodes must continue accepting messages, even when cannot source them what if incoming transaction is a request? Each may generate a response, which cannot be sent! What happens when internal buffering is full? logically independent request/reply networks

physical networks virtual channels with separate input/output queues bound requests and reserve input buffer space K(P-1) requests + K responses per node service discipline to avoid fetch deadlock? NACK on input buffer full NACK delivery? Arch of Parallel Computers CSC / ECE 506 14 Summary Scalability physical, bandwidth, latency and cost level of integration Realizing Programming Models network transactions protocols safety N-1 fetch deadlock Next: Communication Architecture Design Space

How much hardware interpretation of the network transaction? Arch of Parallel Computers CSC / ECE 506 15 The End Arch of Parallel Computers CSC / ECE 506 16

Recently Viewed Presentations

  • O the Blood O the blood Crimson love

    O the Blood O the blood Crimson love

    O the blood of Jesus washes me . O the blood of Jesus shed for me . What a sacrifice that saved my life . Yes, the blood, it is my victory. Savior Son . Holy One . Slain so...
  • Last Edited on 7/30/2018 Proposed NPRR 863 Ancillary

    Last Edited on 7/30/2018 Proposed NPRR 863 Ancillary

    Beyond the minimum PFR, up to 60% of total FRS can come from Load Resources on UFR or FFR. Generation. Online or offline capacity that can be converted to energy within 10 minutes. Dispatched by SCED. Load Resources (UFR not...
  • Pictorial Drawings - Ivy Tech -- Faculty Web

    Pictorial Drawings - Ivy Tech -- Faculty Web

    A perspective drawing shows a view like a picture taken with a camera There are three main types of perspective drawings depending on how many vanishing points are used. These are called one-point, two-point, and three-point perspectives. ... Pictorial Drawings...
  • Forma e formazione dei governi - uniroma1.it

    Forma e formazione dei governi - uniroma1.it

    Il primo ministro è il leader politico dell'esecutivo e capo del governo. ... Il leader della CDU/CSU (Helmut Kohl) fu nominato formatore perché controllava il partito più grande. Ovviamente, Helmut Kohl non andrà a formare un governo che non comprende...
  • Shoulder Arthroscopy - JaxSportsDoc

    Shoulder Arthroscopy - JaxSportsDoc

    Apply Ice to the shoulder 20 min on / 20 min off during the day for 3 days. ... and do small circles with the arm hanging While sitting in a chair, you may remove your immobilizer 2 week Follow-up...
  • Netzwerke - Freie Universität

    Netzwerke - Freie Universität

    Side-by-Side Execution - CLR kann parallel mehrere Versionen einer Assembly laden Global Assembly Cache Zentraler Speicherort aller öffentlichen Assemblies Vorübersetzte Assemblies für die aktuelle Plattform werden in NativeImages_.. abgelegt CLR prüft zuerst in NativeImages Vorübersetzung mit Ngen.exe (Native Image Generator)...
  • Title of presentation

    Title of presentation

    Public Information Session Proposed Extension of the Acton Quarry May 5, 2009 Purpose of the Evening Provide an overview of Dufferin's extension application for the Acton Quarry Listen to your concerns to ensure Dufferin understands all the issues from your...
  • The Bolton 5 Year Locality Plan - GMHSC

    The Bolton 5 Year Locality Plan - GMHSC

    Dr Wirin Bhatiani, Bolton CCG. Cllr Linda Thomas, Bolton Council . On this opening slide: Thank you to all the staff in Bolton who have worked so hard during the pressure of winter demand