ADAPT: algorithmic differentiation APPLIED TO FLOATING-POINT ...
ADAPT: ALGORITHMIC DIFFERENTIATION APPLIED TO FLOATING-POINT PRECISION TUNING Harshitha Menon, Michael O. Lam, Daniel OsseiKuffuor, Markus Schordan, Scott Lloyd, Kathryn Mohror, Jeffrey Hittinger PRES-759649 work was performed under the auspices of the U.S. Department of Energy by ence Livermore National Laboratory under Contract DE-AC52-07NA27344. Center for Applied Scientific Computing Lawrence Livermore National Laboratory
Error : 0.000000095 Time Error: 0.34 seconds 4 5 GOAL Understand the impact of rounding errors so that we can help develop code that executes faster with better integrity and correctness of result.
6 FLOATING-POINT PRECISION HPC applications extensively use floating point arithmetic operations Computer architectures support multiple levels of precision Higher precision - improve accuracy Lower precision - reduces running time, memory pressure, energy consumption 7
MIXED-PRECISION ARITHMETIC Using multiple levels of precision in a single program Without affecting correctness Improving performance Manually optimizing for mixed precision is challenging 8 Develop an automated analysis technique to Identify the variables that require higher precision
to ensure correctness. Use mixed-precision to achieve a desired output accuracy to improve performance. 9 RELATED WORK Automatically discovering unstable floating-point operations and applying transformations Herbie [Panchekha14 et al.], Darulova18 et al. Search based methods CRAFT [Lam13 et al.], Precimonious/HiFPTuner [Rubio13 et al.]
Rigorous error analysis methods FPTuner [Chiang17 et al.], Rosa/Daisy [Darulova14 et al.] 10 ADAPT : FLOATING-POINT PRECISION ANALYSIS 1 1 HOW DOES THE OUTPUT CHANGE
WITH RESPECT TO ITS INPUTS? For a given y = f(x) First order Taylor series approximation at x=a y = f(a) x Generalizing it for y = f(x1, x2, ,xn) at xi=ai y = fx1(a) x1 ++ fxn(a) xn Obtain f(a) using Algorithmic Differentiation (AD) 12 ALGORITHMIC DIFFERENTIATION (AD) Compute the derivative of the output of a function with
respect to its inputs A program is a sequence of operations Apply the chain rule of differentiation at each operation AD has been used in sensitivity analysis in various domains
AD Tools: CoDiPack, Tapenade Alternatives to AD: Symbolic differentiation, Finite difference 13 REVERSE MODE OF ALGORITHMIC DIFFERENTIATION Y
14 OUTPUT ERROR ESTIMATION Obtain fxi(a) using algorithmic differentiation (AD) Reverse mode of AD is used to compute the partial derivatives of all the variables with respect to the output in a single execution. 15 MIXED-PRECISION
ALLOCATION Estimate the error due to lowering the precision of every dynamic instance of a variable Aggregate the error over all dynamic instance of the variable Greedy approach Sort variables based on error contribution Variables switched to lower precision - estimated error contribution within threshold 16 EVALUATION
Benchmarks and Mini-Apps: 6 benchmarks including the ones from previous work HPCCG, LULESH System: Quartz (Intel Xeon E5-2695 processors with 2.1 GHz cores and 128 GB of memory per node) Blue Waters (XK7 nodes with NVIDIA Kepler GPU) Comparison with existing tools Precimonious, CRAFT : search based FPTuner : real-valued expression 18
EVALUATION ON HPCCG HPCCG from Mantevo benchmark suite ADAPT is able to identify critical sections that need to be in higher precision Mixed precision analysis version achieves 1.1x speedup. 19 EVALUATION ON LULESH Used ADAPT on LULESH to create
mixed precision sensitivity profile Used the profile as a guide to develop a mixed precision version for a CUDA implementation of LULESH Achieved speedup of 1.2x within error threshold of 1e-11 on GPU 20 EVALUATION Program Error
1.20 21 COMPARISON WITH EXISTING TOOLS CRAFT Search based approach Analyzes instructions Precimonious Search based approach Explores hundreds of configurations for tiny benchmarks
FPTuner Rigorous approach Supports only real-valued expression language 23 Analysis Time wrt App time ANALYSIS TIME 1 1 1 1 1
1 1 1 1 1 Precimonious H FPTuner ADAPT
arclength simpsons jetEngine carbonGas Analysis time wrt to the application time. ADAPT has the lowest analysis time 24 LIMITATIONS Analysis limited to inputs used Use representative datasets Control-flow divergence:
Consider control-flow variables as one of the dependent variables Memory requirements Periodic checkpointing Overhead of type cast operations 25 CURRENT AND FUTURE WORK Automate source-level conversion [Poster #219] Better performance model to assign precision
Extend the framework to analyze other types of errors such lossy compression 2 6 CONCLUSION Method using Algorithmic Differentiation (AD) Obtains close estimate of the output error Scaling better than previous methods Applied to HPC benchmarks Mixed precision version achieves 1.2x
speedup for LULESH on GPU 2 7 THANK YOU! QUESTIONS? This work was performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under Contract DE-AC52-07NA27344, via LDRD project 17- SI-004. 2 8 AUTOMATIC FLOATING-POINT
MLA In-Text Citations (Modern Language Association. Examples: Wordsworthstated that Romantic poetry was marked by a "spontaneous overflow of powerful feelings" (263). Romantic poetry is characterized by the "spontaneous overflow of powerful feelings" (Wordsworth 263). Wordsworth extensively explored the role of...
Copy the passage carefully into your notebook using your pen double-spaced. Title the passage "Rules of Notice" along with the name of your novel and page(s) number the passage came from. Annotate the Passage What to Include… Level 1- External...
Managing Shoreline Properties Your name here Water Cycle Watersheds Groundwater Water Quality & Your Home Septic Systems Bacterial Contamination - Septic System Problems Remedies Proper Maintenance and Prevention Regular pumping and inspection No harmful inputs Water conservation and minimizing inputs...
Robert Muhammad, Director. University of North Carolina at . Pembroke. Elizabeth Hunter,Director of Admissions ... Jazmin Aguilar. Andrew Lipovan. Summer Miller. Sam Pomper. William Scurry. Chloe Trinh. Parents. Barbara Trinh. Mt. Tabor Class of 2015 "Never doubt that a small...
Summary. While modern fairy tales will include most of these rules in their structure, they may leave some out to shake things up. Examples of this include the "fractured fairy tale" or the reversal of roles - i.e. MegaMind and...
ZOLL Investigators. J Am Coll Cardiol 1999; 34: 1595-601. Schneider T, Martens PR, Paschen H, Kuisma M, Wolcke B, Gliner BE, et al. Multicenter, randomized, controlled trial of 150-J biphasic shocks compared with 200- to 360-J monophasic shocks in the...
The Digital du Cange: Moldy Old Tomes Make an Internet Comeback ... c. 1700 CE to present Some Lexica for Archaic and Classical Latin Thesaurus linguae latinae (TLL). 1900+ Forcellini. ... Stage II Merge the parallel lemmata in the separate...
Ready to download the document? Go ahead and hit continue!