ADAPT: algorithmic differentiation APPLIED TO FLOATING-POINT ...

ADAPT: algorithmic differentiation APPLIED TO FLOATING-POINT ...

ADAPT: ALGORITHMIC DIFFERENTIATION APPLIED TO FLOATING-POINT PRECISION TUNING Harshitha Menon, Michael O. Lam, Daniel OsseiKuffuor, Markus Schordan, Scott Lloyd, Kathryn Mohror, Jeffrey Hittinger PRES-759649 work was performed under the auspices of the U.S. Department of Energy by ence Livermore National Laboratory under Contract DE-AC52-07NA27344. Center for Applied Scientific Computing Lawrence Livermore National Laboratory

SC 2018 MOTIVATION ADAPT EVALUATION 2 MOTIVATION 3 0.0001100110011001100110011001100.

Error : 0.000000095 Time Error: 0.34 seconds 4 5 GOAL Understand the impact of rounding errors so that we can help develop code that executes faster with better integrity and correctness of result.

6 FLOATING-POINT PRECISION HPC applications extensively use floating point arithmetic operations Computer architectures support multiple levels of precision Higher precision - improve accuracy Lower precision - reduces running time, memory pressure, energy consumption 7

MIXED-PRECISION ARITHMETIC Using multiple levels of precision in a single program Without affecting correctness Improving performance Manually optimizing for mixed precision is challenging 8 Develop an automated analysis technique to Identify the variables that require higher precision

to ensure correctness. Use mixed-precision to achieve a desired output accuracy to improve performance. 9 RELATED WORK Automatically discovering unstable floating-point operations and applying transformations Herbie [Panchekha14 et al.], Darulova18 et al. Search based methods CRAFT [Lam13 et al.], Precimonious/HiFPTuner [Rubio13 et al.]

Rigorous error analysis methods FPTuner [Chiang17 et al.], Rosa/Daisy [Darulova14 et al.] 10 ADAPT : FLOATING-POINT PRECISION ANALYSIS 1 1 HOW DOES THE OUTPUT CHANGE

WITH RESPECT TO ITS INPUTS? For a given y = f(x) First order Taylor series approximation at x=a y = f(a) x Generalizing it for y = f(x1, x2, ,xn) at xi=ai y = fx1(a) x1 ++ fxn(a) xn Obtain f(a) using Algorithmic Differentiation (AD) 12 ALGORITHMIC DIFFERENTIATION (AD) Compute the derivative of the output of a function with

respect to its inputs A program is a sequence of operations Apply the chain rule of differentiation at each operation AD has been used in sensitivity analysis in various domains

AD Tools: CoDiPack, Tapenade Alternatives to AD: Symbolic differentiation, Finite difference 13 REVERSE MODE OF ALGORITHMIC DIFFERENTIATION Y

14 OUTPUT ERROR ESTIMATION Obtain fxi(a) using algorithmic differentiation (AD) Reverse mode of AD is used to compute the partial derivatives of all the variables with respect to the output in a single execution. 15 MIXED-PRECISION

ALLOCATION Estimate the error due to lowering the precision of every dynamic instance of a variable Aggregate the error over all dynamic instance of the variable Greedy approach Sort variables based on error contribution Variables switched to lower precision - estimated error contribution within threshold 16 EVALUATION

Benchmarks and Mini-Apps: 6 benchmarks including the ones from previous work HPCCG, LULESH System: Quartz (Intel Xeon E5-2695 processors with 2.1 GHz cores and 128 GB of memory per node) Blue Waters (XK7 nodes with NVIDIA Kepler GPU) Comparison with existing tools Precimonious, CRAFT : search based FPTuner : real-valued expression 18

EVALUATION ON HPCCG HPCCG from Mantevo benchmark suite ADAPT is able to identify critical sections that need to be in higher precision Mixed precision analysis version achieves 1.1x speedup. 19 EVALUATION ON LULESH Used ADAPT on LULESH to create

mixed precision sensitivity profile Used the profile as a guide to develop a mixed precision version for a CUDA implementation of LULESH Achieved speedup of 1.2x within error threshold of 1e-11 on GPU 20 EVALUATION Program Error

Threshold Output Error Speedup arclength 1E-12 1.7E-13 1.11

simpsons 1E-12 4.5E-14 1.13 jetEngine 1E-13

2.5E-13 1.40 carbonGas 1E-10 1.0E-11 1.57 HPCCG

1E-10 2.8E-15 1.10 LULESH 1E-08 1.8E-11

1.20 21 COMPARISON WITH EXISTING TOOLS CRAFT Search based approach Analyzes instructions Precimonious Search based approach Explores hundreds of configurations for tiny benchmarks

FPTuner Rigorous approach Supports only real-valued expression language 23 Analysis Time wrt App time ANALYSIS TIME 1 1 1 1 1

1 1 1 1 1 Precimonious H FPTuner ADAPT

arclength simpsons jetEngine carbonGas Analysis time wrt to the application time. ADAPT has the lowest analysis time 24 LIMITATIONS Analysis limited to inputs used Use representative datasets Control-flow divergence:

Consider control-flow variables as one of the dependent variables Memory requirements Periodic checkpointing Overhead of type cast operations 25 CURRENT AND FUTURE WORK Automate source-level conversion [Poster #219] Better performance model to assign precision

Extend the framework to analyze other types of errors such lossy compression 2 6 CONCLUSION Method using Algorithmic Differentiation (AD) Obtains close estimate of the output error Scaling better than previous methods Applied to HPC benchmarks Mixed precision version achieves 1.2x

speedup for LULESH on GPU 2 7 THANK YOU! QUESTIONS? This work was performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under Contract DE-AC52-07NA27344, via LDRD project 17- SI-004. 2 8 AUTOMATIC FLOATING-POINT

PRECISION ANALYSIS 29

Recently Viewed Presentations

  • MLA In-Text Citations (Modern Language Association

    MLA In-Text Citations (Modern Language Association

    MLA In-Text Citations (Modern Language Association. Examples: Wordsworthstated that Romantic poetry was marked by a "spontaneous overflow of powerful feelings" (263). Romantic poetry is characterized by the "spontaneous overflow of powerful feelings" (Wordsworth 263). Wordsworth extensively explored the role of...
  • RULES OF NOTICE - Mrs. Stanley's Website

    RULES OF NOTICE - Mrs. Stanley's Website

    Copy the passage carefully into your notebook using your pen double-spaced. Title the passage "Rules of Notice" along with the name of your novel and page(s) number the passage came from. Annotate the Passage What to Include… Level 1- External...
  • Chapter 15

    Chapter 15

    Networking
  • Water Quality for Master Gardeners

    Water Quality for Master Gardeners

    Managing Shoreline Properties Your name here Water Cycle Watersheds Groundwater Water Quality & Your Home Septic Systems Bacterial Contamination - Septic System Problems Remedies Proper Maintenance and Prevention Regular pumping and inspection No harmful inputs Water conservation and minimizing inputs...
  • Senior Year - wsfcs.k12.nc.us

    Senior Year - wsfcs.k12.nc.us

    Robert Muhammad, Director. University of North Carolina at . Pembroke. Elizabeth Hunter,Director of Admissions ... Jazmin Aguilar. Andrew Lipovan. Summer Miller. Sam Pomper. William Scurry. Chloe Trinh. Parents. Barbara Trinh. Mt. Tabor Class of 2015 "Never doubt that a small...
  • Elements of Fairy Tales - BJHS 8J ENGLISH

    Elements of Fairy Tales - BJHS 8J ENGLISH

    Summary. While modern fairy tales will include most of these rules in their structure, they may leave some out to shake things up. Examples of this include the "fractured fairy tale" or the reversal of roles - i.e. MegaMind and...
  • Human impedance variability and defibrillator test protocol Why

    Human impedance variability and defibrillator test protocol Why

    ZOLL Investigators. J Am Coll Cardiol 1999; 34: 1595-601. Schneider T, Martens PR, Paschen H, Kuisma M, Wolcke B, Gliner BE, et al. Multicenter, randomized, controlled trial of 150-J biphasic shocks compared with 200- to 360-J monophasic shocks in the...
  • The Digital du Cange: - University of Kentucky

    The Digital du Cange: - University of Kentucky

    The Digital du Cange: Moldy Old Tomes Make an Internet Comeback ... c. 1700 CE to present Some Lexica for Archaic and Classical Latin Thesaurus linguae latinae (TLL). 1900+ Forcellini. ... Stage II Merge the parallel lemmata in the separate...