EECS 152 Computer Architecture and Engineering Lec 01 ...

EECS 152 Computer Architecture and Engineering Lec 01 ...

CSE 486/586 Distributed Systems Cache Coherence Steve Ko Computer Sciences and Engineering University at Buffalo CSE 486/586 Storage to Memory Weve looked at storage consistency. The same consistency models are equally applicable to memory. Think multiple threads accessing the same memory addresses But a memory system can have another form of consistency mainly for managing caches. Well look at this today. In a multi-core system, there are many caches, and they need to be synchronized. CSE 486/586 2 Caching Basics: CPU-Memory Bottleneck CPU

Memory Performance of high-speed computers is usually limited by memory bandwidth & latency Latency (time for a single access) Memory access time >> Processor cycle time Problematic Bandwidth (number of accesses per unit time) Increase the bus size, etc. Usually OK CSE 486/586 3 Physical Size Affects Latency CPU CPU Small Memory Big Memory Signals have further to travel Fan out to more locations CSE 486/586

4 Memory Hierarchy A CPU Small, Fast Memory (RF, SRAM) B Big, Slow Memory (DRAM) holds frequently used data capacity: Register << SRAM << DRAM (cost) latency: Register << SRAM << DRAM (size) bandwidth: on-chip >> off-chip (delays) On a data access: if data fast memory low latency access (SRAM) If data fast memory long latency access (DRAM) CSE 486/586

5 Inside a Cache Address Processor Address CACHE Data Data copy of main memory location 100 Address Tag Main Memory copy of main memory location 101

100 Data Data Byte Byte 304 Data Byte Line 6848 416 Data Block CSE 486/586 Cache Read Look at Processor Address, search cache tags to find match. Then either Found in cache a.k.a. HIT Return copy of data from cache

Not in cache a.k.a. MISS Read block of data from Main Memory Wait Return data to processor and update cache (Use a replacement algorithm to select a line to replace) CSE 486/586 Cache Write Cache hit: write through: write both cache & memory write back: write cache only, memory is written only when the entry is evicted Cache miss: no write allocate: only write to main memory write allocate (aka fetch on write): fetch into cache Common combinations: write through and no write allocate write back with write allocate

CSE 486/586 8 Administrivia PA3 grading still going on This Friday, no recitation, undergrad office hours from 2 pm 4 pm & general office hours from 4 pm 5 pm CSE 486/586 9 Memory Coherence in SMPs CPU-1 A 100 CPU-2 cache-1 A 100 cache-2 CPU-Memory bus

A 100 memory Suppose CPU-1 updates A to 200. write-back: memory and cache-2 have stale values write-through: cache-2 has a stale value Do these stale values matter? What kind of guarantee do you get? CSE 486/586 10 Cache Coherence A cache coherence protocol ensures that all writes by one processor are eventually visible to other processors i.e., updates are not lost You can consider this a hardware-based update propagation mechanism for distributed caches. Hardware support is required such that only one processor at a time has write permission for a location no processor can load a stale copy of the location after a write CSE 486/586

11 Cache Coherence A memory system is coherent if: A read by a processor P to a location X that follows a write by P to X, with no writes of X by another processor occurring between the write and the read by P, always returns the value written by P. A read by a processor to location X that follows a write by another processor to X returns the written value if the read and write are sufficiently separated in time and no other writes to X occur between the two accesses. Writes to the same location are serialized; that is, two writes to the same location by any two processors are seen in the same order by all processors. (Coherence provides per-location sequential consistency). CSE 486/586 12 One Design: Snoopy Cache CPU-1 A 100 CPU-2

cache-1 A 100 cache-2 CPU-Memory bus A 100 memory Cache controllers work together to maintain cache coherence. Each cache controller snoops on the bus traffic and do the right thing. CSE 486/586 13 Snoopy Cache Coherence Protocol For a read operation, a cache line is shared across multiple caches. For a write operation, a cache line is not shared. Each cache line has a state: M (modified): the processor has written to it. S (shared): other caches have a copy as well.

I (invalid): the data is no longer valid. Writing to a cache line If its M, the cache controller does the write. If it is not M, it sends an invalidation request to other caches, switches the state to M, and does the write. Other cache controllers switch the state to I. Reading a memory address If its a hit, read it. If its not a hit, read it from memory, and other cache controllers switch the state to S. CSE 486/586 14 Cache State Transition Diagram The MSI protocol Each cache line has state bits Address tag state bits M: Modified S: Shared I: Invalid Write miss

(P1 gets line from memory) Other processor reads (P1 writes back) Read miss (P1 gets line from memory) P1 Read by any processor S to t en t in M ite r w Other processor intent to write CSE 486/586

P1 reads or writes Other processor intent to write (P1 writes back) I Cache state in processor P1 15 Two Processor Example (Reading and writing the same cache line) P1 reads P1 writes P2 reads P2 writes P1 reads P1 writes P2 writes P1 Read miss P1 writes

P2 Read miss P2 reads, P1 writes back P1 S M tt n e int P1 reads, P2 writes back P2 S tt n

e int P2 intent to write I M P2 reads or writes Write miss ite r ow P1 intent to write CSE 486/586 Write miss ite r ow P2 intent to write P1 reads

or writes P1 intent to write I 16 Observation Other processor reads P1 writes back Read miss Read by any processor P1 S to t en t in M

Write miss ite r w Other processor intent to write P1 reads or writes Other processor intent to write I If a line is in the M state then no other cache can have a copy of the line! Memory stays coherent, multiple differing copies cannot exist CSE 486/586 17 MESI: An Enhanced MSI protocol increased performance for private data Each cache line has a tag M: Modified Exclusive E: Exclusive but unmodified

S: Shared I: Invalid Address tag state bits Write miss P1 write or read M Other processor reads P1 writes back Read miss, shared Read by any processor S P1 write P1 intent to write Other processor

reads E Other processor intent to write, P1 writes back Other processor intent to write CSE 486/586 P1 read Read miss, not shared Other processor intent to write I Cache state in processor P1 18 Scalable Approach: Directories Every memory block has associated directory information keeps track of copies of cached blocks and their states on a miss, find directory entry, look it up, and communicate

only with the nodes that have copies if necessary in scalable networks, communication with directory and copies is through network transactions Many alternatives for organizing directory information CSE 486/586 19 Basic Operation of Directory k processors. With each cache-block in memory: k presence-bits, 1 dirty-bit With each cache-block in cache: 1 valid bit, and 1 dirty (owner) bit Read from main memory by processor i: If dirty-bit OFF then { read from main memory; turn p[i] ON; } if dirty-bit ON then { recall line from dirty proc (downgrade cache state to shared); update memory; turn dirty-bit OFF; turn p[i] ON; supply recalled data to i;} Write to main memory by processor i: If dirty-bit OFF then {send invalidations to all caches that have the block; turn dirty-bit ON; supply data to i; turn p[i] ON; ... }

CSE 486/586 20 Directory Cache Protocol CPU CPU CPU CPU CPU CPU Cache Cache Cache Cache Cache Cache

Interconnection Network Directory Controller Directory Controller Directory Controller Directory Controller DRAM Bank DRAM Bank DRAM Bank DRAM Bank Assumptions: Reliable network, FIFO message delivery between any given source-destination pair CSE 486/586 21 Cache States

For each cache line, there are 4 possible states: C-invalid (= Nothing): The accessed data is not resident in the cache. C-shared (= Sh): The accessed data is resident in the cache, and possibly also cached at other sites. The data in memory is valid. C-modified (= Ex): The accessed data is exclusively resident in this cache, and has been modified. Memory does not have the most up-to-date data. C-transient (= Pending): The accessed data is in a transient state (for example, the site has just issued a protocol request, but has not received the corresponding protocol reply). CSE 486/586 22 Home directory states For each memory block, there are 4 possible states: R(dir): The memory block is shared by the sites specified in dir (dir is a set of sites). The data in memory is valid in this state. If dir is empty (i.e., dir = ), the memory block is not ), the memory block is not cached by any site. W(id): The memory block is exclusively cached at site id, and has been modified at that site. Memory does not have the most up-to-date data. TR(dir): The memory block is in a transient state waiting for the acknowledgements to the invalidation requests that the

home site has issued. TW(id): The memory block is in a transient state waiting for a block exclusively cached at site id (i.e., in C-modified state) to make the memory block at the home site up-todate. CSE 486/586 23 Summary Cache coherence Making sure that caches do not contain stale copies. Snoopy cache coherence MSI MESI Directory-based Uses a directory per memory bank CSE 486/586 24 Acknowledgements These slides heavily contain material developed and copyright by Krste Asanovic (MIT/UCB) David Patterson (UCB)

And also by: Arvind (MIT) Joel Emer (Intel/MIT) James Hoe (CMU) John Kubiatowicz (UCB) MIT material derived from course 6.823 UCB material derived from course CS252 CSE 486/586 25

Recently Viewed Presentations

  • Donation vs. Supply

    Donation vs. Supply

    Overview. The availability of safe blood is a key component of current medical practice. The U.S. blood supply network of federally-regulated but non-governmental independent nonprofit organizations has been robust for more than 50 years.
  • Language Development - 國立臺灣大學

    Language Development - 國立臺灣大學

    Protoimperative. Gesture intended to get another person to do something for the child. Using Gestures. Beyond Gesturing….. Receptive Language. Expressive/Productive Language. Receptive Language -Early Speech Perception.
  • Strategies for Enhancing Sport-Related Self-Confidence

    Strategies for Enhancing Sport-Related Self-Confidence

    Impact on athletes What Causes of burnout Chronic Stress Erosion of Motivation Prevention strategies Defining Features Emotional & Physical Exhaustion Sport Devaluation Low sense of Personal Accomplishment Raedeke (1997) Raedeke & Smith (2009) Other key features Relatively Chronic State Different...
  • OUTLINE GAMETOGENESIS a. Oogenesis stages of meiosis in

    OUTLINE GAMETOGENESIS a. Oogenesis stages of meiosis in

    Oogenesis stages of meiosis in the female ovarian maturation, hormones and cycles ovulation b. Spermatogenesis stages of meiosis in the male differentiation (spermiogenesis) FERTILIZATION Where it takes place How it takes place EARLY DEVELOPMENT
  • Group III Pollution-Tolerant Benthic Macroinvertebrates Aquatic Worms Aquatic

    Group III Pollution-Tolerant Benthic Macroinvertebrates Aquatic Worms Aquatic

    They can be either pale or red in color, and will wiggle in your ice cube tray. Do not put them in the same ice cube compartment as dragonflies, damselflies, hellgrammites, fishflies, or alderflies as they may be eaten! Slide...
  • Bioreactor Systems Ka-Yiu San A Fermenter / Bioreactor

    Bioreactor Systems Ka-Yiu San A Fermenter / Bioreactor

    Bioreactor Systems Ka-Yiu San Typical cell growth on microcarriers Typical cell growth on microcarriers Perfusion system - to provide fresh nutrient - to remove waste (especially toxic byproducts - mechanical signal Questions? * by Genentech, Corporate Communication A Fermenter /...
  • Chapter 4 POLITICAL SOCIALIZATION AND PUBLIC OPINION

    Chapter 4 POLITICAL SOCIALIZATION AND PUBLIC OPINION

    Shared activities and civic virtue. The "right of association is as inalienable as the right of personal liberty"; promotes an avenue of discourse for a "common undertaking." A common motive or passion establishes stability within the U.S. Tocqueville warns of...
  • Skills Lesson 1 - Primary Resources

    Skills Lesson 1 - Primary Resources

    In the back of your exercise book... Grammar Starter: Analogies. Definition: An analogy is a comparison between two things that are usually thought to be different from each other but that have something in common.