Alliance for Computing at Extreme Scale

Alliance for Computing at Extreme Scale

The Role of Advanced Technology Systems in the ASC Platform Strategy Douglas Doerfler Distinguished Member of Technical Staff Sandia National Laboratories Scalable Computer Architectures Department SAND 2014-3174C Unlimited Release Sandia National Laboratories is a multi-program laboratory managed and operated by Sandia Corporation, a wholly owned subsidiary of Lockheed Martin Corporation, for the U.S. Department of Energys National Nuclear Security Administration under contract DE-AC04-94AL85000. SAND NO. 2011-XXXXP 1 Topics ASC ATS Computing Strategy Partnerships & the Joint Procurement Process Trinity Project Status Advanced Architecture Test Bed Project 2 ASC computing strategy Approach: Two classes of systems Advanced Technology: First of a kind systems that identify and foster technical capabilities and features that are beneficial to ASC applications Commodity Technology: Robust, cost-effective systems to meet the day-to-day simulation workload needs of the program Investment Principles Maintain continuity of production Ensure that the needs of the current and future stockpile are met Balance investments in system cost-performance types with computational requirements Partner with industry to introduce new high-end technology constrained by life-cycle costs

Acquire right-sized platforms to meet the mission needs 3 Advanced Technology Systems Leadership-class platforms Pursue promising new technology paths with industry partners These systems are to meet unique mission needs and to help prepare the program for future system designs Includes Non-Recurring Engineering (NRE) funding to enable delivery of leading-edge platforms Trinity (ATS-1) will be deployed by ACES (New Mexico Alliance for Computing at Extreme Scale, i.e. Los Alamos & Sandia) ATS-2 will be deployed by LLNL 4 Advanced Technology Systems (ATS) ASC Platform Timeline Cielo (LANL/SNL) Sequoia (LLNL) ATS 1 Trinity (LANL/SNL) ATS 2 (LLNL) Commodity Technology Systems (CTS) System Delivery ATS 3 (LANL/SNL) Tri-lab Linux Capacity Cluster II (TLCC II) CTS 1 Dev. & Deploy

Use CTS 2 Retire 12 13 14 15 16 Fiscal Year 17 18 19 20 21 The ACES partnership since 2008 SNL/LANL MOU signed March 2008 to integrate and leverage capabilities Commitment to the shared development and use of HPC to meet NW mission needs Major efforts are executed by project teams chartered by and accountable to the ACES co-directors Cielo delivered ca. 2011 Trinity delivery in 2015 is now our dominant focus Both Laboratories are fully committed to delivering a successful platform as its essential to the Laboratories 6

NNSA/ASC and SC/ASCR are partnering on RFPs Trinity/NERSC-8: ACES & LBL CORAL: LLNL, ORNL and ANL Strengthen the alliance between NNSA and SC on road to exascale Show vendors a more united path on road to exascale Shared technical expertise between labs Should gain cost benefit Saves vendors money/time responding to a single RFP, single set of technical requirements Outside perspective reduces risk -- avoids tunnel vision by one lab More leverage with vendors by sharing information between labs Benefits in production, shared bug reports, quarterly meetings Less likely to be a one-off system with multiple sites participating 7 Why is NNSA/ASC and SC/ASCR collaborating? The April 2011 MOU between SC & NNSA for coordinating Exascale activities was the impetus for ASC and ASCR to work together on the proposed Exascale Computing Initiative (ECI). While ECI is yet to be realized, ASC & ASCR program directors made strategic decisions to co-fund and collaborate on: Technology R&D Investments: FastForward and DesignForward System Acquisitions: Trinity/NERSC-8 and CORAL Great leveraging opportunities to share precious resources (budget & technical expertise) to achieve each programs mission goals, while working out some cultural/bureaucratic differences. The Trinity/NERSC-8 collaboration will proceed with joint RFP and selection, separate system awards, attendance at other systems project reviews and collective problem solving.

8 Trinity & NERSC-8 are two separate projects resulting in two distinct contracts and systems Trinity Mission Drivers Market surveys NERSC-8 Mission Drivers Joint Market surveys Market surveys Creating requirements Release RFP Vendor Selection Negotiations Trinity Contract Trinity System Negotiations Joint Quarterly Reviews, bug reports, collaboration on application transition, advanced options NERSC-8 Contract NERSC-8 System 9

Target System Configurations Trinity NERSC-8 Memory Capacity 2 PB to 4 PB 1 PB to 2 PB Capability Improvement 8 to 10x over Cielo 8 to 10x over Hopper Sustained System Performance (SSP) 20 to 60x over Hopper 10x to 30x over Hopper JMTTI > 24 hours > 35 hours File System BW metric: time to dump 80% RAM 20 mins 30 mins

File System disk capacity > 30x main memory > 20x main memory Power < 12+3 MW < 6 MW Off-platform I/O > 140 GB/s > 180 GB/s 10 ATS Components of Trinity Application transition MPI + (threads, vectors, data locality & allocation) Center of Excellence key technology providers Tightly integrating non-volatile memory, i.e. Burst Buffer Checkpoint/restart In-situ data analysis Advanced power management Better understand And then control the power usage characteristics at the platform and application level 11 Focus Area: Active Power Management

Need to understand power at the platform & application level Policy driven Weighted combination of performance & energy Energy caps based on time of day, physical capacity, etc. Need to understand & control power Cabinet & component level I & V measurements Scalable collection infrastructure Tunable collection fidelity: cabinets to components Administrative & user accessible interface for feedback and tuning P-states (Frequency/Voltage States) P1: 2.1 GHz, 1.25V P2: 1.7 GHz, 1.1625V P3: 1.4 GHz, 1.125V P4: 1.1 GHz, 1.1V AMG demonstration on 6,144 nodes of ORNLs Jaguar shows that managing P-States allows for a 32% decrease in energy used while only increasing time to solution by 7.5% 12

The Need for a Hybrid Storage Model: Grider, et al New Assump ons Economic Analysis Results Must MeetTwoRequirements: 900PBCapacityand100TB/sec At Exascale, have to meet two requirements 100 TB/Sec Burst (30PB burst for 5 minutes every 1 hour) 900 PB of Scratch Capacity (30ish memories) MTTI will likely go down HybridDisk(Capacity)/SSD(Bandwidth) All Disk Hybrid IO Nodes I n t e r c o n n e c t Site wide Shared Global Parallel File System

SSD IO Nodes Compute Unit Site wide Shared Global Parallel File System Compute Unit IO Nodes I n t e r c o n n e c t Compute Unit IO Nodes Compute Unit SSD Mustmovecheckpoint devicecloserto

computememory on node has ji er issues (memory QoS/network QoS, node board space/ economics, etc. off node could be done without interconnect, but interconnect enables much more func on at least near node is required Leads to Hybrid Storage model 13 Checkpoint/Restart and Application Efficiency Nomenclature John T. Daly, A higher order estimate of the optimum checkpoint interval for restart dumps, Future Generation Computer Systems 22 (2006), pp. 303-312 Tau = time for work before checkpoints Delta = checkpoint time R = restart time Ts = total compute time to solution Tw = wallclock time to solution Assumptions: N is large, Ts >> Delta 14 Daly Model Example JMTTI (hours) Optimal Tau (minutes) Tw (hours) Efficiency 1,600 24

230 601 83% 1,400 12 160 657 76% 6 110 752 66% 3 70 932 54% Total Wall ClockTime(hours) TwvsTau Ts=500hours; delta=R=20minutes 1,200 JMTTI

1,000 24 800 12 600 6 400 3 200 20 30 40 50 60 70 80 90 100 110 120 130 140 150 160 170 180 190 200 210 220 230 240

0 Interval betweendumps(minutes) As JMTTI decreases The optimal checkpoint interval decreases -> efficiency decreases Choosing an optimal checkpoint becomes more important I.e. the need for resilience techniques to improve efficiency is crucial to ensure an efficient use of a jobs time allocation 15 Optimal efficiency graph (1/200, 90%) Efficiency is a function of Delta/JMTTI (for derivation see Josip Loncaric @ LANL) Goodness Badness 16 Trinity Status

Formal Design Review completed April 2013 Independent Project Review (Lehman) completed May 2013 Trinity/NERSC8 RFP released August 2013 Technical Evaluation of the proposals completed September 2013 Initial negotiations for both systems completed November 2013 NNSA Independent Cost Review completed Jan 2014 NERSC8 is in the final contract approval stages Should be announced soon Trinity went back to the proposing vendors for a Best and Final Offer (BAFO) Target delivery date of September 2015 is unchanged 17 Test Bed Strategy complements ATS Motivation Significant PRODUCTION code rewrite/modification will be required for future platforms Ensure that when codes make the change it is the right move for longevity, porting efforts, performance etc. Eliminate or reduce missteps Philosophy Have an APPLICATION focus, reduce the impact on codes in a rapidly changing technology environment Both hardware and software is intended (and has proven) to be highly dynamic INTENTIONALLY closer to prototypes than production It is more important to explore a diverse set of architectural alternatives @ rack scale, than push large scale Primary target is ASC researchers, in addition to ASCR co-design centers 18 Westmere + Knights Ferry Arthur Llano+ PowerInsight

Teller SandyBridge Knights Corner (B) SandyBridge Knights Corner (C) Compton Compton V2 Haswell+Powerinsight V2 IvyBridge TBD Volta (XC) Trinity + PowerInsight Kaveri+ PowerInsight V2 Teller V2 Interlagos + Fermi 2090x Interlagos + Kepler K20X Curie (XK6) Curie (XK7) V2 SandyBridge + Kepler K20 Shannon TBD SandyBridge + Kepler K20 + K40

SandyBridge Kepler K20 + K40 + TBD Shannon V2 Shannon V3 Power 7 + FPGA Watson Power 8 Power 8 ESP TBD 64bit ESP HMC 64bit Hammer PLANNED JENGA Sept. 2011 Sept. 2012 Sept. 2013

Apr. 2014 Sept. Exploration Matrix Node Level Threading Vectorization Future Languages OpenMP MPI Intrins. CUDA OpenAcc OpenCL Kokkos Array Cilk+ L T TBB ArBB/ CEAN qthreads pthreads MKL/ Math

Lib. Adv. Lang. DSLs NVIDIA (GPU) ? AMD (CPU) ? AMD (APU) ? ? Intel (CPU) ? Intel (MIC) ? IBM ARM ? ? ? miniFE miniMD miniGhost miniAMR

miniSMAC LULESH SNAP CoMD S3D/SMC NEK5000 20 Example Analysis using Test Beds 120 Memory 100 80 60 40 0 CPU/GPU 100 PowerInsight 80 60 40 20 0 100 200 300

400 500 600 Execution Time in Seconds Intel KNCMPI/OpenMPTradeoffStudy HP/APM X-GeneSpeedup 8 8 7 7 6 5 4 miniFE 3 AMG 2 SNAP UMT 1 Speeduprela veto1core 0

Rela veRun me Power in Watts 20 6 5 AMG 4 UMT 3 SNAP 2 miniFE LULESH 1 0 0 1 #MPI Ranks/#OMPThreads 2 3 4 5 6 #MPI ranks/cores 7

8 21 Thank You Questions? Thank you to the following for content: Trinity & NERSC-8 Project Teams Thuc Hoang ASC HQ Manuel Vigil LANL Jim Laros SNL Josip Loncaric LANL Simon Hammond - SNL and a cast of many others 22 Trinity is Sized for High Fidelity Workloads Point Design: A current LANL 3D problem runs on of Cielo today using about 80 TBytes Projected capacity for a higher fidelity problem is about 750 TBytes in the Trinity timeframe Trinity is sized to support 2 to 4 jobs of this class -> 2 to 4 PBytes -> to of the system 23 Trinity Facility, Power & Cooling Trinity will be located in the Nicholas C. Metropolis center (SCC) at Los Alamos National Lab Facility power is one of the primary constraints in the design of Trinity 12 MW water cooling + 2-3 MW (maybe 4 MW) air cooling available

Inclusive of storage and any other externally attached equipment 300 lbs per square foot floor loading 10,000 to 12,000 square feet of floor space At least 80% of the platform will be water cooled Direct (direct to chip or cold plate) is preferred Indirect (e.g. radiator) method is acceptable Tower water (directly from cooling tower) at up to 32o C is preferred Chilled water at 8.5o C is available but less desirable due to additional $ Under floor air at 12.5o C is available to supplement the water cooling method Concerns Idle power efficiency Rapid ramp up / ramp down load on power grid over 2 MW 24

Recently Viewed Presentations

  • National Prescribing Indicators 2015-2016 - AWMSG

    National Prescribing Indicators 2015-2016 - AWMSG

    pct newcastle west ccg pct newcastle north and east ccg pct northumberland ccg pct north tyneside ccg pct southern derbyshire ccg pct south tees ccg pct great yarmouth & waveney ccg pct corby ccg pct dorset ccg pct hartlepool and...
  • The Outsiders By S.E. Hinton 1. Author information

    The Outsiders By S.E. Hinton 1. Author information

    Themes and basics of the book Published The Outsiders in 1967 at the age of 17 (Began writing it at 15). The story was inspired by a real-life event at Hinton's high school in Tulsa, Oklahoma. The Outsiders is widely...
  • Investigating Photosynthesis National Centre for Biotechnology Education  and

    Investigating Photosynthesis National Centre for Biotechnology Education and

    National Centre for Biotechnology Education and Science and Plants for Schools www.ncbe.reading.ac.uk Investigating Photosynthesis National Centre for Biotechnology Education and Science and Plants for Schools www.ncbe.reading.ac.uk Investigating Photosynthesis National Centre for Biotechnology ...
  • Vocabulary Workshop

    Vocabulary Workshop

    (n.) a large city; the chief city of an area Synonym: large urban center. Antonyms: hamlet, village. Momentous (adj.) very important ...
  • Where Are You? Enter your location in the

    Where Are You? Enter your location in the

    RESEA Evaluation Technical Assistance. Gloria Salas-Kos. Senior Program Analyst and RESEA Evaluation TA Coordinator. Office of Policy Development and Research, ETA, U.S. DOL ... Rate applications on these factors . Selecting evaluators in the top range of scores. Cost is...
  • CA Server Subscription Filtering in EPICS R3.15

    CA Server Subscription Filtering in EPICS R3.15

    This requires presumably an extension to existing DBR_XXX types, and changes to EPICS process control runtime database. Requirements - Client Decides. Channel Access client must . specify LANSCE timing, flavoring needed when subscribing ... Design. Filtering expression specified, as a...
  • Change is the end result of all true

    Change is the end result of all true

    'We shall never know all the good that a simple smile can do.'Mother Teresa 12/5/17 (WU, PPT, NAI Reading, Bury My Heart at Wounded Knee). Why did the government kill buffalo? What was the Ghost Dance? What do you think...
  • The Challenge of Democracy CHAPTER 1 Freedom, Order,

    The Challenge of Democracy CHAPTER 1 Freedom, Order,

    How government chooses the proper mix of order, freedom, and equality in its policymaking has to do with the process of choice. We evaluate the American governmental process using two models of democratic government: majoritarian and pluralist.