HPC Cloud Bad; HPC in the Cloud Good Josh Simons, Office of the CTO, VMware, Inc. IPDPS 2013 Cambridge, Massachusetts 2011 VMware Inc. All rights reserved Post-Beowulf Status Quo Enterprise IT 2 HPC IT Closer to True Scale (NASA) 3 Converging Landscape Convergence driven by increasingly shared concerns, e.g.: Enterprise IT 4 HPC IT
Scale-out management Power & cooling costs Dynamic resource mgmt Desire for high utilization Parallelization for multicore Big Data Analytics Application resiliency Low latency interconnect Cloud computing Agenda HPC and Public Cloud Limitations of the current approach Cloud HPC Performance Throughput Big Data / Hadoop MPI / RDMA HPC in the Cloud
A more promising model 5 Server Virtualization Without Virtualization With Virtualization Application Operating System Hardware Hardware virtualization presents a complete x86 platform to the virtual machine Allows multiple applications to run in isolation within virtual machines on the same physical machine Virtualization provides direct access to the hardware resources to give you much greater performance than software emulation 6 HPC Performance in the Cloud http://science.energy.gov/~/media/ascr/pdf/program-documents/docs/Magellan_final_report.pdf 7 Biosequence Analysis: BLAST C. Macdonell and P. Lu, "Pragmatics of Virtual Machines for High-Performance Computing: A Quantitative Study of Basic Overheads, " in Proc. of the High Perf. Computing & Simulation Conf., 2007. 8
Biosequence Analysis: HMMer 9 Molecular Dynamics: GROMACS 10 EDA Workload Example app app app app app app app app app app app app OS OS
OS OS OS OS OS OS app app operating operating system system virtualization virtualization layer layer hardware hardware hardware hardware Virtual 6% slower Virtual 2% faster 11
ESXi hypervisor M 13 socket socket M vNUMA Performance Study Performance Evaluation of HPC Benchmarks on VMwares ESX Server, Ali Q., Kiriansky, V., Simons J., Zaroo, P., 5th Workshop on System-level Virtualization for High Performance Computing, 2011 14 Compute: GPGPU Experiment General Purpose (GP) computation with GPUs CUDA benchmarks VM Direct Path I/O Small kernels: DSP, financial, bioinformatics, fluid dynamics,
image processing RHEL 6 nVidia (Quadro 4000) and AMD GPUs Generally 98%+ of native performance (worst case was 85%) Currently looking at larger-scale financial and bioinformatics applications 15 MapReduce Architecture MAP Reduce MAP HDFS Reduce MAP Reduce MAP 16 HDFS
vHadoop Approaches M M VM VM Why virtualize Hadoop? Simplified Hadoop cluster configuration and provisioning Support Hadoop usage in existing virtualized datacenters Support multi-tenant environments Project Serengeti 17 Node Node R RR RM MR
R Node Node Node Node VM VM VM VM VM VM HDFS HDFS R R M M R R Compute Compute Node Node Data Data Node Node
Node Node CN CN vHadoop Benchmarking Collaboration with AMAX Seven-node Hadoop cluster (AMAX ClusterMax) Standard tests: PI, DFSIO, Teragen / Terasort Configurations: Native One VM per host Two VMs per host Details: Two-socket Intel X5650, 96 GB, Mellanox 10 GbE, 12x 7200rpm SATA RHEL 6.1, 6- or 12-vCPU VMs, vmxnet3 Cloudera CDH3U0, replication=2, max 40 map and 10 reduce tasks per host Each physical host considered a rack in Hadoops topology description ESXi 5.0 w/dev Mellanox driver, disks passed to VMs via raw disk mapping (RDM) 18 Benchmarks Pi Direct-exec Monte-Carlo estimation of pi # map tasks = # logical processors
1.68 T samples TestDFSIO Streaming write and read 1 TB More tasks than processors Terasort 3 phases: teragen, terasort, teravalidate 10B or 35B records, each 100 Bytes (1 TB, 3.5 TB) More tasks than processors CPU, networking, and storage I/O 19 ~ 4*R/(R+G) = 22/7R/(R+G) = 22/7 Ratio to Native, Lower is Better 1.2 1 0.8 Ratio to Native 0.6 1 VM 2 VMs 0.4
0.2 0 Pi D st e T FS -w IO e rit D st e T FS ad r- e IO T
en G a er 1 TB T aS er t1 or T TB Va a er e at d li 1 TB
T en G a er 5 3. TB Te ra rt o S 5 3. TB a aV r Te
lid e at 5 3. TB A Benchmarking Case Study of Virtualized Hadoop Performance on VMware vSphere 5 http://www.vmware.com/files/pdf/VMW-Hadoop-Performance-vSphere5.pdf 20 Kernel Bypass Model sockets rdma guest kernel kernel sockets tcp/ip driver rdma hardware 21
application user user application tcp/ip driver vmkernel hardware rdma rdma Virtual Infrastructure RDMA Distributed services within the platform, e.g. vMotion (live migration) Inter-VM state mirroring for fault tolerance Virtually shared, DAS-based storage fabric All would benefit from: Decreased latency Increased bandwidth CPU offload 22
330,813.66 0 TCP/IP RDMA Total vMotion Time (sec) 50 45 40 35 30 25 20 15 10 5 0 92% Lower VMware Time (s) Destination CPU Utilization 23 % C o r e U tiliz a t io n u s e d b y v M o tio n
% C o r e U tiliz a tio n u s e d b y v M o tio n 10 0 100000 200000 300000 400000 500000 Pre-copy bandwidth (Pages/sec) 50 45 40 35 30 25 20 15 10 5 0 84 84% Lower %
Lo wer Time (s) Source CPU Utilization Guest OS RDMA RDMA access from within a virtual machine Scale-out middleware and applications increasingly important in the Enterprise memcached, redis, Cassandra, mongoDB, GemFire Data Fabric, Oracle RAC, IBM pureScale, Big Data an important emerging workload Hadoop, Hive, Pig, etc. And, increasingly, HPC 24 SR-IOV VirtualFunction VM DirectPath I/O Single-Root IO Virtualization (SR-IOV): PCI-SIG standard Physical (IB/RoCE/iWARP) HCA can be shared between VMs or by the ESXi hypervisor Virtual Functions direct assigned to
Virtualization PF Device Layer Driver Physical Function controlled by hypervisor I/O MMU Still VM DirectPath, which is incompatible with several important virtualization features VF VF SR-IOV RDMA HCA VMware 25 PF VF
Paravirtual RDMA HCA (vRDMA) offered to VM New paravirtualized device exposed to Virtual Machine Implements Verbs interface OFED Stack Device emulated in ESXi Guest OS vRDMA HCA Device Driver hypervisor Translates Verbs from Guest to Verbs to ESXi OFED Stack vRDMA Device Emulation Guest physical memory regions mapped to ESXi and passed down to physical RDMA HCA Zero-copy DMA directly from/to guest physical memory Completions/interrupts proxied by I/O
0 2n16p 4n32p 8n64p Data courtesy of: Marco Righini Intel Italy 31 Point-to-point Message Size Distribution: STAR-CD Source: http://www.hpcadvisorycouncil.com/pdf/CD_adapco_applications.pdf 32 Collective Message Size Distribution: STAR-CD Source: http://www.hpcadvisorycouncil.com/pdf/CD_adapco_applications.pdf 33 STAR-CD Virtual to Native Run-time Ratios (Lower is Better) STAR-CD A-Class Model (on 8n32p) 1.25 1.19 1.20 1.15
1.15 1.10 1.05 1.00 1.00 0.95 0.90 Physical ESX4 (1 socket) ESX4 (2 socket) Data courtesy of Marco Righini, Intel Italy 34 Software Defined Networking (SDN) Enables Network Virtualization Telephony 650.555.1212 Wireless Telephony Identifier = Location 650.555.1212
Networking 192.168.10.1 35 192.168.10.1 VXLAN Identifier = Location Data Center Networks Traffic Trends NORTH / SOUTH WAN/Internet EAST / WEST 36 Data Center Networks the Trend to Fabrics WAN/Internet WAN/Internet 37 Network Virtualization and RDMA
SDN Decouple logical network from physical hardware Encapsulate Ethernet in IP more layers Flexibility and agility are primary goals RDMA Directly access physical hardware Map hardware directly into userspace fewer layers Performance is primary goal Is there any hope of combining the two? Converged datacenter supporting both SDN management and decoupling along with RDMA 38 38 Secure Private Cloud for HPC Research Group 1 Research Group m Users IT Public Clouds
VMware vCloud Director User Portals Catalogs Security VMware VMware vCloud vCloud API API Research Cluster 1 Research Cluster n VMware vShield Programmatic Control and Integrations 39 VMware vCenter Server VMware vCenter Server VMware vCenter Server
VMware vSphere VMware vSphere VMware vSphere Massive Consolidation 40 Run Any Software Stacks Support groups with disparate software requirements Including root access 41 App A App B OS A OS B virtualization virtualization layer layer virtualization
virtualization layer layer virtualization virtualization layer layer hardware hardware hardware hardware hardware hardware Separate workloads Secure multi-tenancy Fault isolation and sometimes performance 42 App A App B OS A OS B
virtualization virtualization layer layer virtualization virtualization layer layer virtualization virtualization layer layer hardware hardware hardware hardware hardware hardware Live Virtual Machine Migration (vMotion) 43 Use Resources More Efficiently Avoid killing or pausing jobs App C Increase
overall throughput OS A 44 App A App B App A App C OS A OS B OS A OS B virtualization virtualization layer layer virtualization virtualization layer layer virtualization virtualization layer
virtualization virtualization layer layer virtualization virtualization layer layer hardware hardware hardware hardware hardware hardware Multi-tenancy with resource guarantees Define policies to manage resource sharing between groups 46 App C App App A A
AppApp B B App A App C OS A OS OS A A OS B OS B OS A OS B virtualization virtualization layer layer virtualization virtualization layer layer virtualization virtualization layer layer
hardware hardware hardware hardware hardware hardware Protect Applications from Hardware Failures Reactive Fault Tolerance: Fail and Recover 47 App A App A OS OS virtualization virtualization layer layer virtualization virtualization layer layer
virtualization virtualization layer layer hardware hardware hardware hardware hardware hardware Protect Applications from Hardware Failures Proactive Fault Tolerance: Move and Continue 48 MPI-0 MPI-1 MPI-2 OS OS OS
virtualization virtualization layer layer virtualization virtualization layer layer virtualization virtualization layer layer hardware hardware hardware hardware hardware hardware Unification of IT Infrastructure 49 HPC in the (Mainstream) Cloud MPI / RDMA Throughput Throughput
50 Summary HPC Performance in the Cloud Throughput applications perform very well in virtual environments MPI / RDMA applications will experience small to very significant slowdowns in virtual environments, depending on scale and message traffic characteristics Enterprise and HPC IT requirements are converging Though less so with HEC (e.g. Exascale) Vendor and community investments in Enterprise solutions eclipse those made in HPC due to market size differences The HPC community can benefit significantly from adopting Enterprise-capable IT solutions And working to influence Enterprise solutions to more fully address HPC requirements Private and community cloud deployments provide significantly more value than cloud bursting from physical infrastructure to public cloud 51
Enlightenment emerged from Europe in the 18th century, and represents a departure from the legitimacy of government that comes from a religious authority such as a theocracy or the divine right of kings. Core enlightenment values include an emphasis on...
Parallel Data Cube Data Mining OLAP (On-line analytical processing) cube / group-by operator in SQL Data Warehousing for Decision Support Operational data collected into DW DW used to support multi-dimensional views Views form the basis of OLAP processing Our focus:...
Learner Outcomes. 4.1Explain the features of a three-phase lesson plan format for problem-based lessons.. 4.2. Design lessons using a planning process focused on mathematical inquiry. 4.3. Describe specific lesson design ideas, including ways to differentiate instruction.
moles of solute/liters of solution. Often in grams, need to change to moles. Don't forget the mole hill!!! Often in mL, need to change to liters. khDbdcm. Example. A sample of NaNO3 weighing 0.38 g is placed in a 50.0...
Buradasadecesql server kurmakistiyorsaniz Database engingeservice'ive Management Tools-Basic'isecmenizyeterliolacaktir. Fakatileride Visual Studio gibiortamlardacalisirken TFS ve SharePoint kullanacaksanizkurulumlari tam yapmanizsaglikliolacaktir.
Decriminalization and the Charter of Rights and Freedoms: Though sexual orientation not mentioned specifically regarding discrimination, in Egan v. Canada (1995), sexual orientation taken as having same standing, and used in cases defending everything from same-sex marriage to same sex...
Because "the D.C. licensing scheme, in effect, requires SLSA's members to 'desist from performance until they satisfy a state officer upon examination that they are competent [to perform their duties] and pay a fee for permission to go on'" (quoting...
Overview. Participants can expect an overview of data analytics and its importance to the GSA SmartPay®Program. AOPCs will gain necessary oversight into their programs and create time and money saving efficiencies.
Ready to download the document? Go ahead and hit continue!