Arrakis: The Operating System is the Control Plane

Arrakis: The Operating System is the Control Plane

Arrakis: The Operating System is the Control Plane Simon Peter, Jialin Li, Irene Zhang, Dan R. K. Ports, et al. presented by Jimmy You EECS 582 F16 1 Background Todays hardware is fast! Typical commodity desktop (Dell PowerEdge R520 ~$1000): 10G NIC ~2us / 1KB pkt

6-core CPU EECS 582 F16 RAID w/ 1G cache ~25 us / 1KB write 2 Background But Data Center is not as fast as hardware. % of processing time (Redis NoSQL) read 8.7 us write

163 us 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% Hardware Kernel EECS 582 F16 App 3 Background Who is dragging?

EECS 582 F16 4 Background Who is dragging? System Calls are slow: epoll recv send fsync : : : :

27% 11% 37% 84% time time time time of of of of read

read read write EECS 582 F16 5 Motivation Design Goals: Skip kernel for data-plane operations (low overhead) Retain classical server OS features (transparency) Appropriate OS/hardware abstractions (virtualization) EECS 582 F16

6 Hardware I/O virtualization Already de facto on NICs Multiplexing: SR-IOV: split into virtual NICs, each w/ own queues, registers, etc. Protection: IOMMU (e.g. intel VT-d): devices use virtual memory of apps Packet Filter: control the I/O I/O Scheduling: Rate limiting, packet scheduling EECS 582 F16 7

Traditional OS Apps API Multiplexing Naming Resource limit Access Ctrl I/O Scheduling

Protection I/O Processing Hardware Kernel Libs EECS 582 F16 8 Skipping the kernel Apps

API Multiplexing Naming Resource limit Access Ctrl I/O Scheduling Protection I/O Processing Hardware

Kernel Libs EECS 582 F16 9 Skipping the kernel Apps Libs API Multiplexing

Naming Resource limit Access Ctrl I/O Scheduling Protection I/O Processing Hardware Kernel

EECS 582 F16 10 Skipping the kernel Control Plane Data Plane Apps libos Control User Space Data HW Space

Kernel Control Virtual Interface Data Hardware EECS 582 F16 11 Hardware Model NICs (Multiplexing, Protection, Scheduling) Storage VSIC (Virtual Storage Interface Controller) each w/ queues etc. VSA (Virtual Storage Areas)

mapped to physical devices associated with VSICs VSA & VSIC : many-to-many mapping EECS 582 F16 12 Control Plane Interface VIC (Virtual Interface Card) Apps can create/delete VICs, associate them to doorbells doorbells (like interrupt?) associated with events on VICs filter creation e.g. create_filter(rx,*,tcp.port == 80)

EECS 582 F16 13 Control Plane Features Access control enforced by filters infrequently invoked (during set-up etc.) Resource limiting send commands to hardware I/O schedulers Naming VFS in kernel actual storage implemented in apps EECS 582 F16

14 Network Data Interface Apps send/receive directly through sets of queues filters applied for multiplexing doorbell used for asynchronous notification (e.g. packet arrival) both native (w/ zero-copy) and POSIX are implemented EECS 582 F16 15 Storage Data Interface VSA supports read, write, flush persistent data structure (log, queue)

modified Redis by 109 LOC operations immediately persistent on disk eliminate marshaling (layout in memory = in disk) data structure specific caching & early allocation EECS 582 W16 16 Evaluation 1. 2. 3.

4. 5. 6. UDP echo server Memcached key-value store Redis NoSQL store HTTP load balancer (haproxy) IP-layer middle box Performance isolation (rate limiting) EECS 582 W16 17 Case 1: UDP echo

EECS 582 W16 18 Case 2: Memcached EECS 582 W16 19 Case 3: Redis NoSQL EECS 582 W16 20 Case 3: Redis NoSQL contd

Reduced in-mem GET latency by 65% 9 us 4 us Linux Arrakis 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% Hardware Kernel/libIO App Reduced persistent SET latency by 81% 163 us 31 us

Linux Arrakis 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% Hardware Kernel/libIO App Adapted from the original presentation at OSDI14 EECS 582 W16 21 Case 4: HTTP load balancer (haproxy)

EECS 582 W16 22 Case 5: IP-layer middlebox EECS 582 W16 23 Case 6: Performance Isolation EECS 582 W16 24 Conclusion

Pros: much better raw performance (for I/O intensive Data Center apps) Redis: up to 9x throughput and 81% speedup Memcached: scales to 3x throughput Cons: some features require hardware functionality that is no yet available require modification of applications not clear about storage abstractions EECS inside 582 W16 the hardware not easy to track behaviors 25

Discussion Related work (IX, Exokernel, Multikernel, etc.) Is Arrakis trading OS features for raw performance? How will new techniques change this trade-off? (SDN, NetFPGA) And of course, how much does raw performance matter? Security concerns EECS 582 W16 26 Related Work 90s library Oses Exokernel, SPIN, Nemesis

Kernel-bypass U-Net, Infiniband, Netmap, Moneta-D High-performance I/O stacks mTCP, OpenOnLoad, Sandstorm, Aerie IX, Dune; Barrelfish (Multikernel) Adapted from the original presentation at OSDI14 EECS 582 W16 27 IX, Arrakis, Exokernel, Multikernel Arrakis is like Exokernel built on Barrelfish (multikernel) IX Arrakis

Reduce SysCall overhead Adaptive batching Run to completion No SysCall in data-plane Hardware virtualization No IOMMU No SR-IOV Expect more than what we have Enforcement of

network I/O policy Under software control Rely on hardware EECS 582 W16 28 raw performance vs. (everything else) Two potential (and maybe diverging) direction: Be hardware-dependent (NetFPGA etc.) Be software-controllable (SDN etc.) 60s switchboard operator

Modern Operating Systems EECS 582 W16 29 Security concerns Will bypassing the kernel be safe? EECS 582 W16 30

Recently Viewed Presentations

  • Tool Development for Multi-Million Gate Designs

    Tool Development for Multi-Million Gate Designs

    http://vlsicad.eecs.umich.edu/BK/PDtools/tar.gz/LATEST/ Jarrod A. Roy, David A. Papa,Saurabh N. Adya, Hayward H. Chan, James F. Lu, Aaron N. Ng,
  • Chapter 8 Work and Machines - Council Rock School District

    Chapter 8 Work and Machines - Council Rock School District

    Section 1: Work and Power. Work is the transfer of energy that occurs when a _____ is applied over a _____. Work is done when an object moves in the same direction of the force. 2 things must happen for...
  • C- Spine Adult vs pediatric

    C- Spine Adult vs pediatric

    Such an alignment places the foramen magnum in-line with the spinal canal, this corresponds to the junction of the brain stem and the spinal cord. Hangman's Fracture Hangman's Fracture The upper portion of the cervical spine (skull, C1, C2) separates...
  • Transfer Pricing Course - Universitas Indonesia

    Transfer Pricing Course - Universitas Indonesia

    AK Ch 7. PSAK 50 (2013), PSAK 55 (2013), ... PSAK 71 memperkenankan entitas untuk memilih menerapkan model akuntansi lindung nilai sesuai PSAK 71 atau PSAK 55 secara keseluruhan, PSAK 71 juga memberikan tambahan opsi kebijakan akuntansi untuk menerapkan PSAK...
  • Stroke and Neuropsychology - University of Kansas Hospital

    Stroke and Neuropsychology - University of Kansas Hospital

    Visuospatial abilities. Carrying out tasks (apraxia) Executive functioning (problem-solving, sequencing, set-shifting, concept formation, abstract thinking) Lack of awareness of impairment (agnosia) ... Stroke and Neuropsychology Last modified by:
  • The Effect of Interactive Writing on the Development of Early ...

    The Effect of Interactive Writing on the Development of Early ...

    Methodology: Interactive Writing. Collaborative writing experience . Instruction begins with a small or large group negotiating written text with teacher. Teacher supports participation in the process and product (shared-pen) Teacher talks through literacy concepts based on the needs of the...
  • How is RoI Calculated? Which way is best?

    How is RoI Calculated? Which way is best?

    Rate of Improvement Calculation and Decision Making Caitlin S. Flinn, EdS, NCSP Andrew E. McCrea, MS, NCSP Why we're hereā€¦ While there exists a wealth of convincing research supporting the implementation of a response-to-intervention (RtI) framework, there are many questions...
  • Open up your laptops, go to MrHyatt.rocks, and

    Open up your laptops, go to MrHyatt.rocks, and

    Polaris will not always be the Pole Star or North Star. The Earth's rotation axis happens to be pointing almost exactly at Polaris now. In 13,000 years the precession of the rotation axis will mean that the bright star Vega...