Lecture 1: Course Introduction and Overview

CS194-24 Advanced Operating Systems Structures and Implementation Lecture 13 Two-Level Scheduling (Cont) Segmentation/Paging/Virtual Memory March 18th, 2013 Prof. John Kubiatowicz http://inst.eecs.berkeley.edu/~cs194-24 Goals for Today Two-Level Scheduling/Tessellation (Cont) Segmentation/Paging/Virtual Memory (Review) Interactive is important! Ask Questions! Note: Some slides and/or pictures in the following are adapted from slides 2013 3/18/13 Kubiatowicz CS194-24 UCB Fall 2013 Lec 13.2 Recall: Example of DRF vs Asset vs CEEI Resources <1000 CPUs, 1000 GB> 2 users A: <2 CPU, 3 GB> and B: <5 CPU, 1 GB> 100% 100% 100% User A User B

50% 50% 50% 0% 0% 0% CPU Mem a) DRF 3/18/13 CPU Mem b) Asset Fairness CPU Mem c) CEEI Kubiatowicz CS194-24 UCB Fall 2013 Lec 13.3 Recall: Two Level Scheduling: Control vs Data Plane Monolithic CPU and Resource Scheduling Resource Allocation And Distribution Two-Level Scheduling

Application Specific Scheduling Split monolithic scheduling into two pieces: Course-Grained Resource Allocation and Distribution to Cells Chunks of resources (CPUs, Memory Bandwidth, QoS to Services) Ultimately a hierarchical process negotiated with service providers Fine-Grained (User-Level) Application-Specific Scheduling Applications allowed to utilize their resources in any way they see fit Performance Isolation: Other components of the system cannot interfere with Cells use of resources 3/18/13 Kubiatowicz CS194-24 UCB Fall 2013 Lec 13.4 Recall: Adaptive Resource-Centric Computing (ARCC) Resource Allocation (Control Plane) Partitioning and Distribution Resource Assignments Application1 el Channel Ch an ne l

3/18/13 Performance Reports GUI Service Running System (Data Plane) n an h C Observation and Modeling QoS-aware Scheduler Application2 Block Service QoS-aware Scheduler Network Service QoS-aware Scheduler Channel n an Ch el

Kubiatowicz CS194-24 UCB Fall 2013 Lec 13.5 Resource Allocation Goal: Meet the QoS requirements of a software component (Cell) H He ear H a t He ear rtb bea He ar tb ea ts a r t b e a ts t b e a ts e a ts ts Complications: Execution Execution Execution Execution Execution s sn tisno ctaino lcotainos Alcoltainos Alcoltaio Alcola Alol Al Behavior tracked through applicationspecific heartbeats and systemlevel monitoring Dynamic exploration of performance space to find

operation points Many cells with conflicting Models Modeling Models Modeling requirements and State Adaptation Models Evaluation Modeling Adaptation and State Models Evaluation Modeling Finite Resources and State Adaptation Models Evaluation Modeling Adaptatio and State Evaluation Hierarchy of resource ownership and State Adaptati Evaluation Context-dependent resource availability Stability, Efficiency, Rate of Resource Discovery, Convergence, Access Control, Advertisement

3/18/13 Kubiatowicz CS194-24 UCB Fall 2013 Lec 13.6 Performance-Aware Convex Optimization for Resource Allocation (PACORA) Express problem as an optimization with constraints 7 Maximize throughput, performance, battery life, etc. Tune for QoS Measure performance as a function of resources for each application Create performance model Refine performance function from application history Combine resource-value functions and QoS requirements to make resource decisions Application is given enough resources for QoS Turn off resources that dont improve the system Developers dont need to understand resources Basic Solution to Impedance Mismatch problem 3/18/13 Kubiatowicz CS194-24 UCB Fall 2013 Lec 13.7 Tackling Multiple Requirements: Express as Convex Optimization Problem

+ Resource-Value Function Speech Runtime1 Runtime1(r(0,1), , r(n-1,1)) Runtime2 Runtime2 (r(0,2), , r(n-1,2)) Runtimei Runtimei(r(0,i), , r(n-1,i)) 3/18/13 Penaltyi Stencil (subject to restrictions on the total amount of resources) Graph Penalty2 Continuously minimize using the penalty of the system Penalty1 Penalty Function

8 Kubiatowicz CS194-24 UCB Fall 2013 Lec 13.8 Brokering Service: The Hierarchy of Ownership Discover Resources in Domain Parent Broker Sibling Broker Devices, Services, Other Brokers Resources self-describing? Local Broker Allocate and Distribute Resources to Cells that need them Solve Impedance-mismatch problem Dynamically optimize execution Hand out Service-Level Child Agreements (SLAs) to Cells Broker Deny admission to Cells when violates existing agreements Complete hierarchy Throughout world graph of applications 3/18/13

Kubiatowicz CS194-24 UCB Fall 2013 Lec 13.9 Space-Time Partitioning Cell Space Spatial Partition: Performance isolation Each partition receives a vector of basic resources A number HW threads Chunk of physical memory A portion of shared cache A fraction of memory BW Shared fractions of services 3/18/13 Time Partitioning varies over time Fine-grained multiplexing and guarantee of resources Resources are gang-scheduled Controlled multiplexing, not uncontrolled virtualization Partitioning adapted to

the systems needs Kubiatowicz CS194-24 UCB Fall 2013 Lec 13.10 Efficient Space-Time Partitioning Communication-Avoiding Gang Scheduling Multiplexer Multiplexer Multiplexer GangScheduling Algorithm GangScheduling Algorithm GangScheduling Algorithm GangScheduling Algorithm Not Not Muxed Muxed Multiplexer Core 0 Core 1 Core 2

Identical or Consiste nt Schedule s Core 3 Supports a variety of Cell types with low overhead Cross between EDF (Earliest Deadline First) and CBS (Constant Bandwidth Server) Multiplexers do not communicate because they use synchronized clocks with sufficiently high precision 3/18/13 Kubiatowicz CS194-24 UCB Fall 2013 Lec 13.11 Adaptive, Second-Level Preemptive Scheduling Framework PULSE: Preemptive User-Level SchEduling Framework for adaptive, premptive schedulers: Dedicated access to processor resources Timer Callback and Event Delivery User-level virtual memory mapping Application User-level device control Auxiliary Scheduler: Adaptive Interface with policy service Preemptive Runs outstanding scheduler contexts Scheduler past synchronization points when resizing happens Auxiliary

PULSE Scheduler 2nd-level Schedulers not aware of Preemptive Userexistence of the Auxiliary Scheduler,Level SchEduling Framework but receive resize events A number of adaptive schedulers have already been built: Round-Robin, EDF, CBS, Speed Balancing 3/18/13 Kubiatowicz CS194-24 UCB Fall 2013 Lec 13.12 Example: Tesselation GUI Service QoS-aware Scheduler Operate on user-meaningful actions E.g. draw frame, move window Service time guarantees (soft real-time) Differentiated service per application E.g. text editor vs video Performance isolation from other 3/18/13applications Kubiatowicz CS194-24 UCB Fall 2013 Lec 13.13 GUI Service Architecture 3/18/13 Kubiatowicz CS194-24 UCB Fall 2013 Lec 13.14 Composite Resource with

Greatly Improved QoS Nano-X/ Linux 3/18/13 Nano-X/GuiServ(1)/ GuiServ(2)/ GuiServ(4)/ Tess Tess Tess Tess Kubiatowicz CS194-24 UCB Fall 2013 Lec 13.15 Cell Creation and Resizing Requests From Users Major Change Request Admission Control ACK/NACK ACK/NACK Policy Service Minor Changes Space-Time

system Resource All resources Graph group (STRG)Cellwith Cel fraction of resources Cel l Cel l l Cel l Resource Allocation And Adaptation Mechanism Offline Models and Behavioral Parameters Global Policies/ User Policies and Preferences Cell #1 Cell #2 Cell #3 Architecture of Tessellation OS Online

Performance Monitoring, Model Building, and Prediction (Current Resources) User/System 3/18/13 NICs Partition #1 Partition #2 Partition #3 Cores Tessellatio n Kernel TPM/ Network Cache/ Physical Crypto Bandwidth Local Store Memory (Trusted) Partition STRG Validator Partition Mapping Resource Planner Multiplexing

and Multiplexin g Layer Partition Partition QoS Channel Mechanis Implementation Enforcement Authenticator m Layer Performance Counters Kubiatowicz CS194-24 UCB Fall 2013 Partitionable (and Trusted) Hardware Lec 13.16 Administrivia Moved code deadline for Lab2 to Saturday Design documents can be turned in on Sunday Dont forget to do Group evaluations! Still working on grading the exam! Ive posted the exam for those of your who want to look at it Attendance at sections and design reviews is not optional! We will be taking attendance on all future Friday sections lack of attendance will be counted against you Communication and meetings with members of your

group is also not optional I expect that every group should meet at least 3 times (once a week) during the course of a project If anyone is having trouble getting together with your groups, please talk to me 3/18/13 Kubiatowicz CS194-24 UCB Fall 2013 Lec 13.17 How to Evaluate a Scheduling algorithm? Deterministic modeling takes a predetermined workload and compute the performance of each algorithm for that workload Queueing models Mathematical approach for handling stochastic workloads Implementation/Simulation: Build system which allows actual algorithms to be run against actual data. Most flexible/general. 3/18/13 Kubiatowicz CS194-24 UCB Fall 2013 Lec 13.18 A Final Word On Scheduling When do the details of the scheduling policy and fairness really matter? When there arent enough resources to go around

When should you simply buy a faster computer? Assuming youre paying for worse response time in reduced productivity, customer angst, etc Might think that you should buy a faster X when X is utilized 100%, but usually, response time goes to infinity as utilization100% 100% Response time (Or network link, or expanded highway, or ) One approach: Buy it when it will pay for itself in improved response time Utilization An interesting implication of this curve: Most scheduling algorithms work fine in the linear portion of the load curve, fail otherwise 3/18/13 Argues for buying Kubiatowicz Fall 2013 Lecof 13.19 aCS194-24 fasterUCB X when hit knee Virtualizing Resources Physical Reality: Different Processes/Threads share the same hardware

Need to multiplex CPU (Just finished: scheduling) Need to multiplex use of Memory (Today) Need to multiplex disk and devices (later in term) Why worry about memory sharing? The complete working state of a process and/or kernel is defined by its data in memory (and registers) Consequently, cannot just let different threads of control use the same memory Physics: two different pieces of data cannot occupy the same locations in memory Probably dont want different threads to even have 3/18/13 Kubiatowicz CS194-24 UCB Fall 2013 Lec 13.20 Important Aspects of Memory Multiplexing Controlled overlap: Separate state of threads should not collide in physical memory. Obviously, unexpected overlap causes chaos! Conversely, would like the ability to overlap when desired (for communication) Translation: Ability to translate accesses from one address space (virtual) to a different one (physical) When translation exists, processor uses virtual addresses, physical memory uses physical addresses Side effects: Can be used to avoid overlap Can be used to give uniform view of memory to

programs Protection: Prevent access to private memory of other processes 3/18/13 Different pages of memory can be given special CS194-24 UCB Fall Lec 13.21 behavior Kubiatowicz (Read Only, Invisible to2013 user programs, Binding of Instructions and Data to Memory Binding of instructions and data to addresses: Choose addresses for instructions and data from the standpoint of the processor data1: dw 32 start: lw r1,0(data1) jal checkit loop: addi r1, r1, -1 bnz r1, r0, loop checkit: 0x300

0x900 0x904 0x908 0x90C 0xD00 00000020 8C2000C0 0C000340 2021FFFF 1420FFFF Could we place data1, start, and/or checkit at different addresses? Yes When? Compile time/Load time/Execution time 3/18/13 Related: which physical memory locations hold Kubiatowicz CS194-24 UCB Fall 2013 Lec 13.22 Multi-step Processing of a Program for Execution Preparation of a program for execution involves components at: Compile time (i.e. gcc) Link/Load time (unix ld does link) Execution time (e.g. dynamic libs)

Addresses can be bound to final values anywhere in this path Depends on hardware support Also depends on operating system Dynamic Libraries Linking postponed until execution Small piece of code, stub, used to locate the appropriate 3/18/13 memory-resident Kubiatowicz CS194-24 UCB Fall 2013 library Lec 13.23 Recall: Uniprogramming Uniprogramming (no Translation or Protection) System Application Valid 32-bit Addresses Application always runs at same place in physical memory since only one application at a time 0xFFFFFFFF Application Operating can access any physical address 0x00000000

Application given illusion of dedicated machine 3/18/13 by giving itKubiatowicz CS194-24 UCB Fall 2013 reality of a dedicated machine Lec 13.24 Multiprogramming (First Version) Multiprogramming without Translation or Protection Must somehow prevent address overlap between 0xFFFFFFFF threads Operating System Application2 0x00020000 Application1 0x00000000 Trick: Use Loader/Linker: Adjust addresses while program loaded into memory (loads, stores, jumps) 3/18/13 Kubiatowicz CS194-24 UCB Fall 2013 Lec 13.25 Multiprogramming (Version with Protection) Can we protect programs from each other without translation? 0xFFFFFFFF Operating System

Limit=Base+0x10000 Application2 0x00020000 Application1 Base=0x20000 0x00000000 Yes: use two special registers Base and Limit to prevent user from straying outside designated area If user tries to access an illegal address, cause an error 3/18/13 Kubiatowicz CS194-24 UCB Fall 2013 Lec 13.26 Two Views of Memory with Dynamic Translation Virtual Addresses CPU MMU Physical Addresses Untranslated read or write Two views of memory: View from the CPU (what program sees, virtual memory) View fom memory (physical memory) Translation box converts between the two views

Translation helps to implement protection If task A cannot even gain access to task Bs data, no way for A to adversely affect B With translation, every program can be linked/loaded into same region of user address space Overlap avoided through translation, not relocation 3/18/13 Kubiatowicz CS194-24 UCB Fall 2013 Lec 13.27 Simple Segmentation: Base and Limit registers (CRAY-1) Base Virtual Address CPU Limit

its own dedicated machine, with memory starting at 0 Program gets continuous region of memory Addresses within program do not have to be relocated when program placed in different region of DRAM User may have multiple segments available (e.g x86) Loads and stores include segment ID in opcode: x86 Example: mov [es:bx],ax. 3/18/13 Kubiatowicz CS194-24 UCB Fall 2013 Lec 13.28 Issues with simple segmentation method process 6 process 6 process 6 process 6 process 5 process 5 process 5 process 5 process 9 process 9 process 2 OS process 10 OS

OS OS Fragmentation problem Not every process is the same size Over time, memory space becomes fragmented Hard to do inter-process sharing Want to share code segments when possible Want to share memory between processes Helped by by providing multiple segments per process Need enough physical memory for every process 3/18/13 Kubiatowicz CS194-24 UCB Fall 2013 Lec 13.29 More Flexible Segmentation 1 1 4 1 2 3 4 2 2 3 user view of memory space physical memory space

Logical View: multiple separate segments Typical: Code, Data, Stack Others: memory sharing, etc Each segment is given region of contiguous memory Has a base and limit in physical 3/18/13 Can reside anywhere Kubiatowicz CS194-24 UCB Fall memory 2013 Lec 13.30 Implementation of Multi-Segment Model Virtual Seg # Offset Address Base0Limit0 V Base1Limit1 V Base2Limit2 V Base3Limit3 N Base4Limit4 V Base5Limit5 N Base6Limit6 N Base7Limit7 V > + Error Physic al Addres s

Segment map resides in processor Segment number mapped into base/limit pair Base added to offset to generate physical address Error check catches offset out of range As many chunks of physical memory as entries Segment addressed by portion of virtual address However, could be included in instruction instead: 3/18/13 x86 Example: Kubiatowicz UCB Fall 2013 mov CS194-24 [es:bx],ax. Lec 13.31 Intel x86 Special Registers 80386 Special Registers Typical Segment Register Current Priority is RPL Of Code Segment (CS) 3/18/13 Kubiatowicz CS194-24 UCB Fall 2013 Lec 13.32 Example: Four Segments (16 bit addresses) Seg Offset

15 14 13 0 Virtual Address Format 0x0000 0x4000 Seg ID # Base 0 (code) 0x400 0 1 (data) 0x480 0 2 0xF00 (shared) 0 0x0000 3 (stack) 0x000 0 0x4000 0x4800 0x5C00 Limit 0x080 0 0x140 0 0x100 0 0x300 0 Might be shared 0x8000 Space for

Other Apps 0xC000 0xF000 Virtual Address Space 3/18/13 Physical Address Space Kubiatowicz CS194-24 UCB Fall 2013 Shared with Other Apps Lec 13.33 Example of segment translation 0x240 0x244 0x360 0x364 0x368 0x4050 main: la $a0, varx jal strlen strlen: li $v0, 0 ;count loop: lb $t0, ($a0) beq $r0,$t1, done

varx dw 0x314159 Seg ID # Base Limit 0 (code) 0x400 0x080 0 0 1 (data) 0x480 0x140 0 0 2 0xF00 0x100 (shared) 0 0 Lets simulate a bit of this code to see what happens0x000 (PC=0x240): 3 (stack) 0x300 1. Fetch 0x240. Virtual segment #? 0; Offset? 0x2400 0 Physical address? Base=0x4000, so physical addr=0x4240 Fetch instruction at 0x4240. Get la $a0, varx Move 0x4050 $a0, Move PC+4PC 2. Fetch 0x244. Translated to Physical=0x4244. Get jal strlen Move 0x0248 $ra (return address!), Move 0x0360 PC 3. Fetch 0x360. Translated to Physical=0x4360. Get li $v0,0 Move 0x0000 $v0, Move PC+4PC 4. Fetch 0x364. Translated to Physical=0x4364. Get lb $t0, ($a0) Since $a0 is 0x4050, try to load byte from 0x4050 Translate 0x4050. Virtual segment #? 1; Offset? 0x50 Physical address? Base=0x4800, Physical addr = 0x4850, Load Byte from 0x4850$t0, Move PC+4PC 3/18/13 Kubiatowicz CS194-24 UCB Fall 2013

Lec 13.34 Observations about Segmentation Virtual address space has holes Segmentation efficient for sparse address spaces A correct program should never address gaps (except as mentioned in moment) If it does, trap to kernel and dump core When it is OK to address outside valid range: This is how the stack and heap are allowed to grow For instance, stack takes fault, system automatically increases size of stack Need protection mode in segment table For example, code segment would be read-only Data and stack would be read-write (stores allowed) Shared segment could be read-only or readwrite What must be saved/restored on context switch? 3/18/13 Kubiatowicz CS194-24 UCB Fall 2013 Lec 13.35 Segment table stored in CPU, not in memory Schematic View of Swapping

Extreme form of Context Switch: Swapping In order to make room for next process, some or all of the previous process is moved to disk Likely need to send out complete segments This greatly increases the cost of contextswitching Desirable alternative? 3/18/13 Some way to keep only active portions of a process in memory at any one time Need finer granularity control over physical Kubiatowicz CS194-24 UCB Fall 2013 Lec 13.36 Paging: Physical Memory in Fixed Size Chunks Problems with segmentation? Must fit variable-sized chunks into physical memory May move processes multiple times to fit everything Limited options for swapping to disk Fragmentation: wasted space External: free gaps between allocated chunks Internal: dont need all memory within allocated chunks Solution to fragmentation from segments? Allocate physical memory in fixed size chunks (pages) Every chunk of physical memory is equivalent Can use simple vector of bits to handle allocation:

00110001110001101 110010 Each bit represents page of physical memory 1allocated, 0free Should pages be as big as our previous segments? Kubiatowicz CS194-24 UCB Fall 2013 3/18/13 Lec 13.37 Multiprogramming (Translation and Protection version 2) Problem: Run multiple applications in such a way that they are protected from one another Goals: Isolate processes and kernel from one another Allow flexible translation that: Doesnt lead to fragmentation Allows easy sharing between processes Allows only part of process to be resident in physical memory (Some of the required) Hardware Mechanisms: General Address Translation Flexible: Can fit physical chunks of memory into arbitrary places in users address space Not limited to small number of segments Think of this as providing a large number (thousands) of fixed-sized segments (called pages) Dual Mode Operation Protection base involving kernel/user distinction 3/18/13 Kubiatowicz CS194-24 UCB Fall 2013 Lec 13.38

How to Implement Paging? Virtual Virtual Address: Page # Offset PageTablePtr page page page PageTableSize page page Access page > Error #0 #1 #2 #3 #4 #5 V,R V,R V,R,W V,R,W N V,R,W Physical Page # Offset Physical Address Check Perm Page Table (One per process)

Access Error Resides in physical memory Contains physical page and permission for each virtual page Permissions include: Valid bits, Read, Write, etc Virtual address mapping Offset from Virtual address copied to Physical Address Example: 10 bit offset 1024-byte pages Virtual page # is all remaining bits Example for 32-bits: 32-10 = 22 bits, i.e. 4 million entries Physical page # copied from table into physical address Check Page Table bounds and permissions 3/18/13 Kubiatowicz CS194-24 UCB Fall 2013 Lec 13.39 What about Sharing? Virtual Address Virtual Offset Page # (Process A): PageTablePtrA PageTablePtrB page page page page page page #0

#1 #2 #3 #4 #5 V,R V,R V,R,W V,R,W N V,R,W page page page page page page #0 #1 #2 #3 #4 #5 V,R N V,R,W This physical page appears in address N space of both processes V,R V,R,W Shared Page Virtual Address: Virtual Offset Page #

Process B 3/18/13 Kubiatowicz CS194-24 UCB Fall 2013 Lec 13.40 Simple Page Table Discussion What needs to be switched on a context switch? Page table pointer and limit Simple Page Table Analysis Pros Simple memory allocation Easy to Share Con: What if address space is sparse? E.g. on UNIX, code starts at 0, stack starts at (2 31-1). With 1K pages, need 4 million page table entries! Con: What if table really big? Not all pages used all the time would be nice to have working set of page table in memory How about combining paging and segmentation? Segments with pages inside them? Need some sort of multi-level translation 3/18/13 Kubiatowicz CS194-24 UCB Fall 2013 Lec 13.41 What is in a Page Table Entry (PTE)? What is in a Page Table Entry (or PTE)? Pointer to next-level page table or to actual page Permission bits: valid, read-only, read-write, writeonly Example: Intel x86 architecture PTE:

(Physical Page Number) 31-12 (OS) PWT PCD Address same format previous slide (10, 10, 12-bit offset) Page Frame Number Intermediate page tables Free called0Directories L D A U WP 11-9 8 7 6 5 4 3 2 1 0 P: Present (same as valid bit in other architectures) W: Writeable U: User accessible PWT: Page write transparent: external cache write-through PCD: Page cache disabled (page cannot be cached) 3/18/13 Kubiatowicz CS194-24 UCB Fall 2013 Lec 13.42 Examples of how to exploit a PTE How do we use the PTE? Invalid PTE can imply different things: Region of address space is actually invalid or Page/directory is just somewhere else than memory Validity checked first

OS can use other (say) 31 bits for location info Usage Example: Demand Paging Keep only active pages in memory Place others on disk and mark their PTEs invalid Usage Example: Copy on Write UNIX fork gives copy of parent address space to child Address spaces disconnected after child created How to do this cheaply? Make copy of parents page tables (point at same memory) Mark entries in both sets of page tables as read-only Page fault on write creates two copies Usage Example: Zero Fill On Demand New data pages must carry no information (say be zeroed) invalid;CS194-24 page fault on2013 use gets Lec 13.43 3/18/13Mark PTEs asKubiatowicz UCB Fall Multi-level Translation: Segments + Pages What about a tree of tables? Lowest level page tablememory still allocated with bitmap Higher levels often segmented Could have any number of levels. Example (top

Virtual Virtual Virtual segment): Seg # Page # Offset Address: Base0Limit0 V Base1Limit1 V Base2Limit2 V Base3Limit3 N Base4Limit4 V Base5Limit5 N Base6Limit6 N Base7Limit7 V page page page page page page > #0 #1 #2 #3 #4 #5 Access Error V,R V,R V,R,W V,R,W N V,R,W

Physical Page # Offset Physical Address Check Perm Access Error What must be saved/restored on context switch? Contents of top-level segment registers (for this 3/18/13 example) Kubiatowicz CS194-24 UCB Fall 2013 Lec 13.44 Process A What about Sharing (Complete Segment)? Virtual Virtual Seg # Page # Offset Base0Limit0 V Base1Limit1 V Base2Limit2 V Base3Limit3 N Base4Limit4 V Base5Limit5 N Base6Limit6 N Base7Limit7 V Process B 3/18/13 page page page page

page page #0 #1 #2 #3 #4 #5 V,R V,R V,R,W V,R,W N V,R,W Shared Segment Base0Limit0 V Base1Limit1 V Base2Limit2 V Base3Limit3 N Base4Limit4 V Base5Limit5 N Base6Limit6 N Base7Limit7 V Virtual Virtual Seg # Page # Offset Kubiatowicz CS194-24 UCB Fall 2013 Lec 13.45 Another common example: two-level Physical PhysicalOffset page table Virtual Address: 10 bits 10 bits

12 bits Virtual Virtual P1 indexP2 indexOffset Address: Page # 4KB PageTablePtr 4 bytes Tree of Page Tables Tables fixed size (1024 entries) On context-switch: save single PageTablePtr register Valid bits on Page Table Entries Dont need every 2nd-level 3/18/13 4 bytes Kubiatowicz CS194-24 UCB Fall 2013 Lec 13.46 Multi-level Translation Analysis Pros: Only need to allocate as many page table entries as we need for application In other wards, sparse address spaces are easy

Easy memory allocation Easy Sharing Share at segment or page level (need additional reference counting) Cons: One pointer per page (typically 4K 16K pages today) Page tables need to be contiguous However, previous example keeps tables to exactly one page in size Two (or more, if >2 levels) lookups per reference Seems very expensive! 3/18/13 Kubiatowicz CS194-24 UCB Fall 2013 Lec 13.47 Inverted Page Table With all previous examples (Forward Page Tables) Size of page table is at least as large as amount of virtual memory allocated to processes Physical memory may be much less Much of process space may be out on disk or not Virtual Offset in#use Page Hash Table Physical Page # Offset

Answer: use a hash table Called an Inverted Page Table Size is independent of virtual address space Directly related to amount of physical memory Very attractive option for 64-bit address spaces Cons: Complexity of managing hash changes 3/18/13 Kubiatowicz CS194-24 UCB Fall 2013 Lec 13.48 Recall: Caching Concept Cache: a repository for copies that can be accessed more quickly than the original Make frequent case fast and infrequent case less dominant Caching underlies many of the techniques that are used today to make computers fast Can cache: memory locations, address translations, pages, file blocks, file names, network routes, etc Only good if: Frequent case frequent enough and Infrequent case not too expensive Important measure: Average Access time = (Hit Rate x Hit Time) + (Miss Rate x Miss Time) 3/18/13 Kubiatowicz CS194-24 UCB Fall 2013

Lec 13.49 Why does caching matter for Virtual Memory? Virtual Virtual Virtual Seg # Page # Offset Address: Base0Limit0 V Base1Limit1 V Base2Limit2 V Base3Limit3 N Base4Limit4 V Base5Limit5 N Base6Limit6 N Base7Limit7 V page page page page page page > #0 #1 #2 #3 #4 #5 V,R V,R V,R,W V,R,W N V,R,W

Physical Page # Offset Physical Address Check Perm Access Error Access Error Cannot afford to translate on every access At least three DRAM accesses per actual DRAM access Or: perhaps I/O if page table partially on disk! Even worse: What if we are using caching to make memory access faster than DRAM access??? Solution? Cache translations! 3/18/13 Kubiatowicz CS194-24 UCB Fall 2013 Lec 13.50 Caching Applied to Address Translation CPU Virtual Address TLB Cached? Yes No

Physical Address Physical Memory e t v l a u S es R Translate (MMU) Data Read or Write (untranslated) Question is one of page locality: does it exist? Instruction accesses spend a lot of time on the same page (since accesses sequential) Stack accesses have definite locality of reference Data accesses have less page locality, but still some Can we have a TLB hierarchy? 3/18/13 Kubiatowicz CS194-24 UCB Fall 2013 Lec 13.51 What Actually Happens on a TLB Miss? Hardware traversed page tables: On TLB miss, hardware in MMU looks at current page table to fill TLB (may walk multiple levels)

If PTE valid, hardware fills TLB and processor never knows If PTE marked as invalid, causes Page Fault, after which kernel decides what to do afterwards Software traversed Page tables (like MIPS) On TLB miss, processor receives TLB fault Kernel traverses page table to find PTE If PTE valid, fills TLB and returns from fault If PTE marked as invalid, internally calls Page Fault handler Most chip sets provide hardware traversal Modern operating systems tend to have more TLB faults since they use translation for many things Examples: shared segments user-level portions of an operating system 3/18/13 Kubiatowicz CS194-24 UCB Fall 2013 Lec 13.52 What happens on a Context Switch? Need to do something, since TLBs map virtual addresses to physical addresses Address Space just changed, so TLB entries no longer valid! Options? Invalidate TLB: simple but might be expensive What if switching frequently between processes? Include ProcessID in TLB This is an architectural solution: needs

hardware What if translation tables change? For example, to move page from memory to disk or vice versa Must invalidate TLB entry! 3/18/13 Otherwise, might think that page is still in memory!Kubiatowicz CS194-24 UCB Fall 2013 Lec 13.53 What TLB organization makes sense? CPU TLB Cache Memory Needs to be really fast Critical path of memory access In simplest view: before the cache Thus, this adds to access time (reducing cache speed) Seems to argue for Direct Mapped or Low Associativity However, needs to have very few conflicts! With TLB, the Miss Time extremely high! This argues that cost of Conflict (Miss Time) is much higher than slightly increased cost of access (Hit Time) Thrashing: continuous conflicts between accesses What if use low order bits of page as index into

TLB? 3/18/13 First page of code, data, stack may map to same entry Need 3-wayKubiatowicz associativity atUCB least? CS194-24 Fall 2013 Lec 13.54 TLB organization: include protection How big does TLB actually have to be? Usually small: 128-512 entries Not very big, can support higher associativity TLB usually organized as fully-associative cache Lookup is by Virtual Address Returns Physical Address + other info What happens when fully-associative is too slow? Put a small (4-16 entry) direct-mapped cache in front Virtual Called a TLB Slice Address Physical Address Dirty Ref Valid Access ASID Example for MIPS R3000: 0xFA00 0x0003

0x0040 0x0041 3/18/13 0x0010 0x0011 Y N N N Y Y Y Y Y Kubiatowicz CS194-24 UCB Fall 2013 R/W R R 34 0 0 Lec 13.55 Example: R3000 pipeline includes TLB stages MIPS R3000 Pipeline Dcd/ Reg Inst Fetch TLB I-Cache

RF ALU / E.A Memory Operation E.A. TLB Write Reg WB D-Cache TLB 64 entry, on-chip, fully associative, software TLB fault handler Virtual Address Space ASID 6 V. Page Number 20 Offset 12 0xx User segment (caching based on PT/TLB entry) 100 Kernel physical space, cached 101 Kernel physical space, uncached 11x Kernel virtual space Allows context switching among 64 user processes without TLB flush 3/18/13 Kubiatowicz CS194-24 UCB Fall 2013 Lec 13.56

Reducing translation time further As described, TLB lookup is in serial with cache lookup: Virtual Address 10 offset V page no. TLB Lookup V Access Rights PA P page no. offset 10 Physical Address Machines with TLBs go one step further: they overlap TLB lookup with cache access. Works because offset available early 3/18/13 Kubiatowicz CS194-24 UCB Fall 2013 Lec 13.57 Overlapping TLB & Cache Access Here is how this might work with a 4K cache: assoc lookup 32 index

TLB 20 page # 10 2 disp 00 1K 4K Cache 4 bytes Hit/ Miss FN = FN Data What if cache size is increased to 8KB? Hit/ Miss Overlap not complete Need to do something else. See CS152/252 Another option: Virtual Caches Tags in cache are virtual addresses Translation only happens on cache misses 3/18/13 Kubiatowicz CS194-24 UCB Fall 2013 Lec 13.58 Summary (1/2) Memory is a resource that must be shared Controlled Overlap: only shared when appropriate

Translation: Change Virtual Addresses into Physical Addresses Protection: Prevent unauthorized Sharing of resources Segment Mapping Segment registers within processor Segment ID associated with each access Often comes from portion of virtual address Can come from bits in instruction instead (x86) Each segment contains base and limit information Offset (rest of address) adjusted by adding base Page Tables Memory divided into fixed-sized chunks of memory Virtual page number from virtual address mapped through page table to physical page number Offset of virtual address same as physical address Large page tables can be placed into virtual memory Multi-Level Tables Virtual address mapped to series of tables Permit sparse population of address space Inverted page table Size of page table related to physical memory size 3/18/13 Kubiatowicz CS194-24 UCB Fall 2013 Lec 13.59 Summary (2/2) PTE: Page Table Entries

Includes physical page number Control info (valid bit, writeable, dirty, user, etc) A cache of translations called a Translation Lookaside Buffer (TLB) Relatively small number of entries (< 512) Fully Associative (Since conflict misses expensive) TLB entries contain PTE and optional process ID On TLB miss, page table must be traversed If located PTE is invalid, cause Page Fault On context switch/change in page table TLB entries must be invalidated somehow TLB is logically in front of cache Thus, needs to be overlapped with cache access to be really fast 3/18/13 Kubiatowicz CS194-24 UCB Fall 2013 Lec 13.60

Recently Viewed Presentations

  • PLATE TECTONICS - Wine Production

    PLATE TECTONICS - Wine Production

    Mesas Incorrect These are erosional landforms in a desert. As two continental plates move toward each other, what landforms would you expect to see? Structural Mountains - correct - Two continental masses will push into each other and "crumple" the...
  • Awareness raising session for Parents and Carers Mrs

    Awareness raising session for Parents and Carers Mrs

    Awareness raising session for Parents and Carers Mrs Emily Jones
  • What is Title I and How Can I be Involved?

    What is Title I and How Can I be Involved?

    What is Title I and How Can I be Involved? Annual Parent Meeting (School Name) (Date) * * * * * * * * * * * * * * Definition of Title I: Title I provides federal funding to...
  • Ch. 1: Matter and Change

    Ch. 1: Matter and Change

    Breakdown of Matter Matter can be classified into… Pure Substances A sample of matter, either an element or a compound, that consists of only one component with definite physical and chemical properties and a definite composition. Pure Substances Elements: simplest...
  • 投影片 1 - 國立臺灣大學

    投影片 1 - 國立臺灣大學

    (1) Compare to the FFT-based method, what are the advantage and disadvantage of (a) the recursive method and (b) the chirp Z transform method for STFT implementation? (10 scores) (2) Prove that (a) the Gabor transform of ( ) is...
  • Tone and Mood - Barren County School District

    Tone and Mood - Barren County School District

    Mood word: Depressing. Explanation: The setting of this short story is at a house that is described as melancholy. Additionally, the day is "dull, dark, and soundless." The depressing diction here creates a feeling of sadness. Additionally, the story is...
  • 123philosophy.files.wordpress.com

    123philosophy.files.wordpress.com

    How do you feel about gun control? The Second Amendment to the U.S. Constitution . In the US, mass murders committed with guns are (way too) frequent: 49 people in an Orlando nightclub in 2016 . 20 children and 6...
  • Prezentácia programu PowerPoint

    Prezentácia programu PowerPoint

    dávkovanie vyberať s ohľadom na možné riziko AMR, konkrétne u liekov, kde sú dávky príliš nízke. indikácie obmedziť len na tie, kde je účinnosť preukázaná, vyhnúť sa všeobecným indikáciám bez pevného klinického základu.