Distributed FSM Modeling and Verification Using Maude
MemGuard: Memory Bandwidth Reservation System for Efficient Performance Isolation in Multi-core Platforms Apr 9, 2012 Heechul Yun+, Gang Yao+, Rodolfo Pellizzoni*, Marco Caccamo+, Lui Sha+ +University of Illinois at Urbana-Champaign *University of Waterloo Real-Time Applications Resource intensive real-time applications Multimedia processing(*), real-time data analytic(**), object tracking Requirements Need more performance and cost less Commercial Off-The Shelf (COTS) Performance guarantee (i.e., temporal predictability and isolation) (*) ARM, QoS for High-Performance and Power-Efficient HD Multimedia, 2010 (**) Intel, The Growing Importance of Big Data and Real-Time Analytics, 2012 2 Modern System-on-Chip (SoC) More cores Freescale P4080 has 8 cores
More sharing Shared memory hierarchy (LLC, MC, DRAM) Shared I/O channels More performance Less cost But, isolation? 3 Problem: Shared Memory Hierarchy Part 1 Core1 Part 2 Part 3 Part 4 Core2 Core3 Core4 Shared Last Level Cache (LLC)
Space contention Memory Controller (MC) Access contention DRAM Shared hardware resources OS has little control 4 Memory Performance Isolation Part 1 Part 2 Part 3 Part 4 Core1 Core2
Core3 Core4 LLC LLC LLC LLC Memory Controller DRAM Q. How to guarantee worst-case performance? Need to guarantee memory performance 5 Inter-Core Memory Interference Slowdown ratio due to interference foreground background 470.lbm X-axis
(1.5GB/s) 410.bwaves (1.4GB/s) 471.omnetpp Significant slowdown (Up to 2x on 2 cores) Slowdown is not proportional to memory bandwidth usage 6 Background: DRAM Chip Bank 4 Bank 3 Bank 2 Bank 1 READ (Bank 1, Row 3, Col 7) Row 1 Row 2 Row 3 Row 4 Row 5 activate precharge
Col7 Row Buffer Read/write State dependent access latency Row miss: 19 cycles, Row hit: 9 cycles (*) PC6400-DDR2 with 5-5-5 (RAS-CAS-CL latency setting) Background: Memory Controller(MC) Bruce Jacob et al, Memory Systems: Cache, DRAM, Disk Fig 13.1. Request queue Buffer read/write requests from CPU cores Unpredictable queuing delay due to reordering 8 Background: MC Queue Re-ordering Initial Queue Reordered Queue Core1: READ Row 1, Col 1 Core2: READ Row 2, Col 1 Core1: READ Row 1, Col 2 Core1: READ Row 1, Col 1
Core1: READ Row 1, Col 2 Core2: READ Row 2, Col 1 DRAM DRAM 2 Row Switch 1 Row Switch Improve row hit ratio and throughput Unpredictable queuing delay 9 Challenges for Real-Time Systems Memory controller(MC) level queuing delay Main source of interference Unpredictable (re-ordering) DRAM state dependent latency DRAM row open/close state Core 1 Core
2 Core 3 Core 4 Memory Controller Predictable Memory Controller Memory Controller DRAM State of Art Predictable DRAM controller h/w: [Akesson07] [Paolieri09] [Reineke11] not in COTS 10 Our Approach OS level approach Works on commodity h/w Core
1 Core 2 Core 3 Core 4 OS control mechanism Memory Memory Controller Controller DRAM DRAM Guarantees performance of each core Maximizes memory performance if possible 11 MemGuard MemGuard Operating System
PMC Core4 Core2 Multicore Processor Memory Controller DRAM DIMM Memory bandwidth reservation and reclaiming 12 Memory Bandwidth Reservation Idea OS monitor and enforce each cores memory bandwidth usage Enqueue tasks 2 Budget 1 Core activity 0 1ms 2ms
Dequeue tasks Dequeue tasks computation memory fetch 13 Memory Bandwidth Reservation Key Insight B/W regulators control memory request rates (Cores)request rate (DRAM) service rate (Memory cont roller) minimal queuing delay System-wide reservation rule up to the guaranteed bandwidth rmin m: #of cores Bi: Core is b/w reservation 14 Guaranteed Bandwidth: rmin Worst-case DRAM performance (service rate) All memory requests go to the same bank (no bank-level parallelism) and cau se row miss Example (PC6400-DDR2*)
Peak B/W: 6.4GB/s 64bytes I/O = 10ns, hide command latency by interleaving Calculated guaranteed B/W: 1.3GB/s PRE + ACT + RD + I/O (8x8bytes) = 47.5ns Measured guaranteed B/W: 1.2GB/s Performance Isolation Sum of memory b/w reservation guaranteed b/w 15 (*) PC6400-DDR2 with 5-5-5 (RAS-CAS-CL latency setting) Memory Access Pattern Memory requests Memory requests Time(ms) Time(ms) Memory access patterns vary over time Static reservation is inefficient 16
Memory Bandwidth Reclaiming Key objective Redistribute excess bandwidth to demanding core s Improve memory b/w utilization Predictive bandwidth donation and reclaiming Donate unneeded budget predictively Reclaim on-demand basis 17 Reclaim Example Time 0 Initial budget 3 for both cores Time 3,4 Decrement budgets Time 10 (period 1) Predictive donation (total 4) Time 12,15 Core 1: reclaim Time 16 Core 0: reclaim
Time 17 Core 1: no remaining budget; dequeue tasks Time 20 (period 2) Core 0 donates 2 Core 1 do not donate 18 Best-effort Bandwidth Sharing Key objective Utilize best-effort bandwidth whenever possible Best-effort bandwidth After all cores use their budgets (i.e., delivering guaranteed bandwidth), before the next period begins Sharing policy Maximize throughput Broadcast all cores to continue to execute 19 Evaluation Platform Intel Core2Quad Core 0
Core 1 Core 2 Core 3 I I I I D D D D L2 Cache L2 Cache Memory Controller
DRAM Intel Core2Quad 8400, 4MB L2 cache, PC6400 DDR2 DRAM Prefetchers were turned off for evaluation Power PC based P4080 (8core) and ARM based Exynos4412(4core) also has been ported Modified Linux kernel 3.6.0 + MemGuard kernel module https://github.com/heechul/memguard/wiki/MemGuard Used the entire 29 benchmarks from SPEC2006 and synthetic benchmarks 20 462.libquantum memory hogs (foreground) (background) C0 C2 L2 L2 Shared Memory Intel Core2 Normalized IPC
L2/s 470.lbm (background) C2 .2GB L2 /s Shared Memory Intel Core2 25 Conclusion Inter-Core Memory Interference Big challenge for multi-core based real-time systems Sources: queuing delay in MC, state dependent latency in DRAM MemGuard OS mechanism providing efficient per-core memory perfor mance isolation on COTS H/W Memory bandwidth reservation and reclaiming support https://github.com/heechul/memguard/wiki/MemGuard 26
Thank you. 27 Effect of Reclaim IPC improvement of background ([email protected]/s) is 3.8x IPC reduction of foreground ([email protected]/s) is 3% 28 Reclaim Underrun Error 29 Effect of Spare Sharing IPC of background ([email protected]/s) improves 40% IPC of foreground ([email protected]/s) also improves 9% 30 Isolation and Throughput Effect of rmin 4 core configuration 31 Isolation Effect of Reservation Isolation
Core 2: 0.2 2.0 GB/s for lbm Solo [email protected]/s Core 0: 1.0 GB/s for X-axis Sum b/w reservation < rmin (1.2GB/s) Isolation 1.0GB/s(X-axis) + 0.2GB/s(lbm) = rmin 32 Effect of MemGuard Soft real-time application on each core. Provides differentiated memory bandwidth weight for each core=1:2:4:8 for the guaranteed b/w, spare bandwidth sharing is enabled 33 Hard/Soft Reservation on MemGuard Hard reservation (w/o reclaiming) Can guarantee memory bandwidth Bi regardless of other cores a t each period Wasted if not used Soft reservation (w/ reclaiming) Misprediction can caused missed b/w guarantee at each period Error rate is small---less than 5%.
Bony callus formation. New trabeculae form a bony (hard) callus. Bony callus formation continues until firm union is formed in ~2 months. Figure 6.15, step 3. Bony callus forms. 3. Bony. callus of. spongy. bone. Stages in the Healing of...
Operational speeds ~10-3 to 10+5 [m/s] Contact size ~10-6 to 10-2 [m] Film size ~10-9 to 10-5 [m] The model must be optimised to run across the physical scale System Dynamics at Nano-Scale: Solvation/Hydration: Molecular reordering due to constraining effect...
When a gene is "on" and its protein or RNA product is being made, scientists say that the gene is being EXPRESSED. The on and off states of all of a cell's genes is known as a GENE EXPRESSION PROFILE....
Módosult tudatállapotok - Általában A hős útjának templátja Joseph Campbell alkotta meg a "Hős ezer arcának" kompozit portréját meghatározván, hogy a különböző mitológiákban mi a közös motívum a harcos, az uralkodó, a gyógyító, a szent és a félisten "életpályájában" öt...
Database Models: Flat Files and the Relational Database Objectives: Understand the fundamental structure of the relational database model Learn the circumstances under which it is a better choice than the flat file What is a database? Structured Data Procedures for...
Only stretch warm muscles (i.e. after your aerobic workout) Hold stretch for 10-30 seconds, without bouncing, repeat 2-4 times . At least two days per week. Only takes about 5 minutes to stretch all major muscle groups. Source: American College...
SKIP LISTS Amihood Amir Incorporationg the slides of Goodrich and Tamassia (2004) Sorted Linked List What about Space? What is a Skip List A skip list for a set S of distinct (key, element) items is a series of lists...
Ready to download the document? Go ahead and hit continue!