Bluespec technical deep dive

Bluespec technical deep dive

Constructive Computer Architecture Cache Coherence Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology November 16, 2016 http://www.csg.csail.mit.edu/ 6.175 L22-1 Shared Memory Systems P L1 P L1 P L1 L2 P L1 P L1

P L1 L2 Interconnect M Modern systems often have hierarchical caches Each cache has exactly one parent but can have zero or more children Logically only a parent and its children can communicate directly Inclusion property is maintained between a parent and its children, i.e., Because usually a Li a Li+1 L >> L November 16, 2016 http://www.csg.csail.mit.edu/ 6.175 i+1 i L22-2 Cache-coherence problem CPU-1

A 100 CPU-2 200 cache-1 A 100 cache-2 CPU-Memory bus A 100 200 memory Suppose CPU-1 updates A to 200. write-back: memory and cache-2 have stale values write-through: cache-2 has a stale value Do these stale values matter? What is the view of shared memory for programming? November 16, 2016

http://www.csg.csail.mit.edu/ 6.175 L22-3 Cache-Coherent Memory req res ... req res Monolithic Memory A monolithic memory processes one request at a time; it can be viewed as processing requests instantaneously A memory with hierarchy of caches is said to be coherent, if functionally it behaves like the monolithic memory November 16, 2016 http://www.csg.csail.mit.edu/ 6.175 L22-4

Maintaining Coherence In a coherent memory all loads and stores can be placed in a global order multiple copies of an address in various caches can cause this property to be violated This property can be ensured if: Only one cache at a time has the write permission for an address No cache can have a stale copy of the data after a write to the address has been performed cache coherence protocols are used to maintain coherence November 16, 2016 http://www.csg.csail.mit.edu/ 6.175 L22-5 Cache Coherence Protocols Write request:

the address is invalidated in all other caches before the write is performed Read request: if a dirty copy is found in some cache then that value is written back to the memory and supplied to the reader. Alternatively the dirty value can be forwarded directly to the reader Such protocols are called Invalidation-based November 16, 2016 http://www.csg.csail.mit.edu/ 6.175 L22-6 State needed to maintain Cache Coherence in lo a d va li d

I S sh flu re sto I - cache doesnt contain the address S- cache has the address but so may other caches; hence it can only be read M- only this cache has the address; hence it can be read and written at e MSI encoding store M write-back The states M, S, I can be thought of as an order M>S>I

Upgrade: A cache miss causes transition from a lower state to a higher state Downgrade: A write-back or invalidation causes a transition from a higher state to a lower state November 16, 2016 http://www.csg.csail.mit.edu/ 6.175 L22-7 Cache Actions On a read miss (i.e., Cache state is I): In case some other cache has the address in state M then write back the dirty data to Memory Read the value from Memory and set the state to S On a write miss (i.e., Cache state is I or S): Invalidate the address in all other caches and in case some cache has the address in state M then write

back the dirty data Read the value from Memory if necessary and set the state to M How do we know the state of other caches? November 16, 2016 http://www.csg.csail.mit.edu/ 6.175 Directory L22-8 Directory State Encoding Two-level (L1, M) system S P a P L1 P L1 P L1 Interconnect

a A directory is maintained at each cache to keep track of the state of its childrens caches m.child[ck][a]: the state of child ck for address a; At most one child can be in state M In case of cache hierarchy > 2, the directory also keeps track of the sibling information c.state[a]: M means cs siblings do not have a copy of address a; S means they might November 16, 2016 http://www.csg.csail.mit.edu/ 6.175 L22-9 Directory state encoding transient states to deal with waiting for responses L1 wait state is captured in mshr

Directory in home memory m.waitc[ck][a] : Denotes if memory m is waiting for a response from its child ck No | Yes <[(M|S|I), (No | Yes)]> Childs state November 16, 2016 Waiting for downgrade response http://www.csg.csail.mit.edu/ 6.175 L22-10 A Directory-based Protocol an abstract view P p2m L1

PP P m2p c2m interconnect m2c m2p c2m m2c in p2m PP L1 out PP m Each cache has 2 pairs of queues (c2m, m2c) to communicate with the memory (p2m, m2p) to communicate with the processor

Message format: Req/Resp address state FIFO message passing between each (srcdst) pair except a Req cannot block a Resp Messages in one srcdst path cannot block messages in another srcdst path November 16, 2016 http://www.csg.csail.mit.edu/ 6.175 L22-11 Processing misses: Requests and Responses Cache Cache 3,5,7 1,5,8 3 1 5 2 6

2,6 2,4 Memory November 16, 2016 3,5,7 1,5,8 4 1 2 3 4 5 6 7 8 Up-req send (cache) Up-req proc, Up resp send (memory) Up-resp proc (cache) Dn-req send (memory) Dn-req proc, Dn resp send (cache) Dn-resp proc (memory) Dn-req proc, drop (cache)

Voluntary Dn-resp (cache) http://www.csg.csail.mit.edu/ 6.175 L22-12 CC protocol for blocking caches Extension to the Blocking L1 design discussed in L17-18 November 16, 2016 http://www.csg.csail.mit.edu/ 6.175 L22-13 Req method hit processing method Action req(MemReq r) if(mshr == Ready); let a = r.addr; P let hit = contains(state, a); if(hit) begin p2m m2p let slot = getSlot(state, a); c2m

let x = dataArray[slot]; PP L1 m2c if(r.op == Ld) hitQ.enq(x); else // it is store if (isStateM(state[slot]) dataArray[slot] <= r.data; else begin missReq <= r; mshr <= SendFillReq; missSlot <= slot; end end else begin missReq <= r; mshr <= StartMiss; end // (1) endmethod November 16, 2016 http://www.csg.csail.mit.edu/ 6.175 L22-14 Start-miss and Send-fill rules Rdy -> StrtMiss -> SndFillReq -> WaitFillResp -> Resp -> Rdy rule startMiss(mshr == StartMiss); let slot = findVictimSlot(state); if(!isStateI(state[slot])) begin // write-back (Evacuate) let a = getAddr(state[slot]); let d = (isStateM(state[slot])? dataArray[slot]: -); state[slot] <= (I, _); c2m.enq(m, a, I, d>); end

mshr <= SendFillReq; missSlot <= slot; endrule rule sendFillReq (mshr == SendFillReq); let upg = (missReq.op == Ld)? S : M; c2m.enq(m, missReq.addr, upg, - >); mshr <= WaitFillResp; endrule // (1) http://www.csg.csail.mit.edu/ November 16, 2016 6.175 L22-15 Wait-fill rule and Proc Resp rule Rdy -> StrtMiss -> SndFillReq -> WaitFillResp -> Resp -> Rdy rule waitFillResp(mshr == WaitFillResp); let c, a, cs, d> = m2c.msg; let slot = missSlot; dataArray[slot] <= (missReq.op == Ld)? d : missReq.data; state[slot] <= (cs, a); m2c.deq; mshr <= Resp; endrule // (3) rule sendProc(mshr == Resp); if(missReq.op == Ld) begin c2p.enq(dataArray[slot]); end mshr <= Ready; endrule http://www.csg.csail.mit.edu/

November 16, 2016 6.175 L22-16 Parent Responds rule parentResp; let m,a,y,-> = c2m.msg; if((ic, isCompatible(m.child[i][a],y)) && (m.waitc[c][a]=No)) begin let d = ((m.child[c][a]=I)? m.data[a]: -); m2c.enq(c, a, y, d); m.child[c][a]:=y; c2m.deq; end IsCompatible(M, M) = False endrule IsCompatible(M, S) = False IsCompatible(S, M) = False All other cases = True November 16, 2016 http://www.csg.csail.mit.edu/ 6.175 L22-17 Parent (Downgrade) Requests

rule dwn; let m,a,y,-> = c2m.msg; if (!isCompatible(m.child[i][a], y) && (m.waitc[i][a]=No)) begin m.waitc[i][a] <= Yes; m2c.enq(i, a, (y==M?I:S), - >); end Endrule // (4) This rule will execute as long some child cache is not compatible with the incoming request November 16, 2016 http://www.csg.csail.mit.edu/ 6.175 L22-18 Parent receives Response rule dwnRsp; let m, a, y, data> = c2m.msg; c2m.deq; if(m.child[c][a]=M) m.data[a]<=data; m.child[c][a]<=y; m.waitc[c][a]<=No; endrule // (6) November 16, 2016 http://www.csg.csail.mit.edu/

6.175 L22-19 Child Responds rule dng(mshr != Resp); let = m2c.msg; let slot = getSlot(state,a); if(getCacheState(state[slot])>y) begin let d = (isStateM(state[slot])? dataArray[slot]: -); c2m.enq(m, a, y, d>); state[slot] := (y,a); end // the address has already been downgraded m2c.deq; endrule // (5) and (7) November 16, 2016 http://www.csg.csail.mit.edu/ 6.175 L22-20 Child Voluntarily downgrades rule startMiss(mshr == Ready); let slot = findVictimSlot(state); if(!isStateI(state[slot])) begin // write-back (Evacuate) let a = getAddr(state[slot]);

let d = (isStateM(state[slot])? dataArray[slot]: -); state[slot] <= (I, _); c2m.enq(m, a, I, d>); end endrule // (8) Rules 1 to 8 are complete - cover all possibilities and cannot deadlock or violate cache invariants November 16, 2016 http://www.csg.csail.mit.edu/ 6.175 L22-21 Invariants for a CC-protocol design Directory state is always a conservative estimate of a childs state E.g., if directory thinks that a child cache is in S state then the cache has to be in either I or S state For every request there is a corresponding response, though sometimes it is generated even before the request is processed Communication system has to ensure that

responses cannot be blocked by requests a request cannot overtake a response for the same address At every merger point for requests, we will assume fair arbitration to avoid starvation November 16, 2016 http://www.csg.csail.mit.edu/ 6.175 L22-22 MSI protocol: some issues It never makes sense to have two outstanding requests for the same address from the same processor/cache It is possible to have multiple requests for the same address from different processors. Hence there is a need to arbitrate requests A cache needs to be able to evict an address in order to make room for a different address Voluntary downgrade Memory system (higher-level cache) should be able to force a lower-level cache to downgrade

caches need to keep track of the state of their childrens caches November 16, 2016 http://www.csg.csail.mit.edu/ 6.175 L22-23

Recently Viewed Presentations

  • Chemistry 1A General Chemistry Instructor: Dr. Orlando E.

    Chemistry 1A General Chemistry Instructor: Dr. Orlando E.

    = 291.4 torr. 291.4 torr. 1 atm. 760 torr = 0.3834 atm. 0.3834 atm. 101.325 kPa. 1 atm = 38.85 kPa. The relationship between the volume and pressure of a gas. Boyle's Law. The relationship between the volume and temperature...
  • Getting knowledge into action. Professor Sue Dopson WWW.SBS.OXFORD.EDU

    Getting knowledge into action. Professor Sue Dopson WWW.SBS.OXFORD.EDU

    Prof Sue Dopson SAID BUSINESS SCHOOL WWW.SBS.OXFORD.EDU EDUCATING LEADERS FOR 800 YEARS Prof Sue Dopson SAID BUSINESS SCHOOL WWW.SBS.OXFORD.EDU EDUCATING LEADERS FOR 800 YEARS . Title: Slide 1 ... Templeton College Other titles:
  • Chapter 21: Verb Shifts

    Chapter 21: Verb Shifts

    Chapter 21: Verb Shifts. Verb Tense Chart. 1. Past perfect. had +-ed. furthest in the past; happened before. verb form another past action. Past Perfect Tense Example: The Easter Bunny . had hopped. down the trail. 2. Past-ed. verb form...
  • &#x27;Glasgow Sonnets (i)&#x27; Edwin Morgan - Mrs Ruxton

    'Glasgow Sonnets (i)' Edwin Morgan - Mrs Ruxton

    It is generally accepted that the original tenements, like the one described in the poem, should have been renovated rather than destroyed. 'Glasgow Sonnet (i)' Edwin Morgan A mean wind wanders through the backcourt trash. Hackles on puddles rise, old...
  • Trust Positions: Policy and Recommendations Presentation Outline: I.

    Trust Positions: Policy and Recommendations Presentation Outline: I.

    Trust Positions: Policy and Recommendations Trust Personnel* a. Identification of Trust Positions: The following shall be considered positions of trust: the Executive Secretary for Integral Development, designated Director General of the IACD, as well as the Assistant Secretaries, the advisors...
  • Angle Measures in Polygons - Administration

    Angle Measures in Polygons - Administration

    Example Find x x+x+55+55=360 2x+110=360 2x=250 X=125 Polygon Interior Angles Theorem The sum of the measures of the interior angles of a convex n-gon is (n-2)•180°. Corollary to the Polygon Interior Angles Theorem The measure of each interior angle of...
  • Experimental Studies of High-speed Liquid Films on Downward ...

    Experimental Studies of High-speed Liquid Films on Downward ...

    Times New Roman Wingdings Arial 굴림 Arial Black Symbol Tahoma Wingdings 2 Wingdings 3 Webdings 바탕 films Microsoft Excel Chart Canvas 7 Drawing Adobe Photoshop Image Microsoft Equation 3.0 MathType 4.0 Equation An Overview of the Fluid Dynamics Aspects of...
  • Emergency Preparedness and Response (Module 16)

    Emergency Preparedness and Response (Module 16)

    is the area surrounding the facility within the security perimeter, fence or other designated property marker. It is the area under the immediate control of the facility or operator. ... rendering large territories uninhabitable for decades with enormous clean-up efforts...