Computer Architecture - Seoul National University

Computer Architecture - Seoul National University

Computer Architecture Multiprocessors Memory/Storage Architecture Lab Shared Memory Shared memory multiprocessor Hardware provides single physical address space for all processors Synchronize shared variables using locks Memory access time UMA (uniform) vs. NUMA (nonuniform) Memory/Storage Architecture Lab

2 Example: Sum Reduction Sum 100,000 numbers on 100 processor UMA Each processor has ID: 0 Pn 99 Partition 1000 numbers per processor Initial summation on each processor sum[Pn] = 0; for (i = 1000*Pn; i < 1000*(Pn+1); i = i + 1) sum[Pn] = sum[Pn] + A[i]; Now need to add these partial sums Reduction: divide and conquer Half the processors add pairs, then quarter,

Need to synchronize between reduction steps Memory/Storage Architecture Lab 3 Example: Sum Reduction half = 100; repeat synch(); if (half%2 != 0 && Pn == 0) sum[0] = sum[0] + sum[half-1]; /* Conditional sum needed when half is odd; Processor0 gets missing element */ half = half/2; /* dividing line on who sums */ if (Pn < half) sum[Pn] = sum[Pn] + sum[Pn+half]; until (half == 1);

Memory/Storage Architecture Lab 4 Synchronization in Shared Memory Shared data type item = ; var buffer. Array [0..n-1] of item; in, out: 0..n-1; counter. 0..n; in, out, counter :=0; Producer process repeat

produce an item in nextp while counter = n do no-op; buffer [in] := nextp; in := in + 1 mod n; counter := counter + 1; until false; Memory/Storage Architecture Lab 5 Bounded-Buffer Example Consumer process repeat while counter = 0 do no-op; nextc := buffer [out];

out := out + 1 mod n; counter := counter - 1; consume the item in nextc until false; Memory/Storage Architecture Lab 6 More Detailed Picture counter := counter + 1 counter := counter - 1 register-a := counter;

register-b := counter; register-a := register-a + 1; register-b := register-b -1; counter := register-a; counter := register-b; Memory/Storage Architecture Lab 7 Problem producer producer consumer consumer

producer consumer execute execute execute execute execute execute register-a register-a register-b register-b counter := counter :=

:= counter := register-a + 1 := counter := register-b -1 register-a register-b Assuming counter is initially 5, what will be the final value of counter? Memory/Storage Architecture Lab 8 The Critical-Section Problem

n processes all competing to use some shared data Each process has a code segment, called critical section, in which the shared data is accessed. Problem ensure that when one process is executing in its critical section, no other process is allowed to execute in its critical section. Structure of process Pi repeat entry section critical section exit section remainder section until false; Memory/Storage Architecture Lab 9

Correctness Criteria for a Solution to the Critical-Section Problem Mutual Exclusion. If process Pi is executing in its critical section, then no other processes can be executing in their critical sections. Progress. If no process is executing in its critical section and there exist some processes that wish to enter their critical section, then the selection of the processes that will enter the critical section next cannot be postponed indefinitely. Bounded Waiting. A bound must exist on the number of times

that other processes are allowed to enter their critical sections after a process has made a request to enter its critical section and before that request is granted. Memory/Storage Architecture Lab 10 Mutual Exclusion with Test-and-Set Shared data: var lock: boolean (initially false) Process Pi repeat while Test-and-Set (lock) do no-op; critical section Entry Section

Exit Section lock := false; remainder section until false; Memory/Storage Architecture Lab 11 Naive Synchronization Entry Section Exit Section Memory/Storage Architecture Lab 12 Optimized Synchronization

Entry Section Exit Section Memory/Storage Architecture Lab 13 Message Passing Each processor has private physical address space Hardware sends/receives messages between processors Memory/Storage Architecture Lab 14

Loosely Coupled Clusters Network of independent computers Each has private memory and OS Connected using I/O system E.g., Ethernet/switch, Internet Suitable for applications with independent tasks Web servers, databases, simulations, High availability, scalable, affordable Problems

Administration cost (prefer virtual machines) Low interconnect bandwidth c.f. processor/memory bandwidth on an SMP Memory/Storage Architecture Lab 15 Sum Reduction (Again) Sum 100,000 on 100 processors First distribute 100 numbers to each

The do partial sums sum = 0; for (i = 0; i<1000; i = i + 1) sum = sum + AN[i]; Reduction Half the processors send, other half receive and add The quarter send, quarter receive and add, Memory/Storage Architecture Lab 16 Sum Reduction (Again)

Given send() and receive() operations limit = 100; half = 100;/* 100 processors */ repeat half = (half+1)/2; /* send vs. receive dividing line */ if (Pn >= half && Pn < limit) send(Pn - half, sum); if (Pn < (limit/2)) sum = sum + receive(); limit = half; /* upper limit of senders */ until (half == 1); /* exit with final sum */ Send/receive also provide synchronization Assumes send/receive take similar time to addition Memory/Storage Architecture Lab

17 Network Topology 2-D grid or mesh Memory/Storage Architecture Lab n-cube 18 Network Topology Crossbar Memory/Storage Architecture Lab Omega network

19 The Evolution-Revolution Spectrum of Computer Architecture Memory/Storage Architecture Lab 20

Recently Viewed Presentations