Randomized Accuracy Aware Program Transformations for Efficient Approximate Computations Sasa Misailovic Joint work with Zeyuan Allen Zhu Jonathan Kelner MIT CSAIL Martin Rinard

Nodes represent computation Edges represent flow of data Functions process individual data Reduction nodes aggregate data

avg avg avg avg

min Functions process individual data Reduction nodes aggregate data avg avg avg

avg f1 f2 f3 min Function substitution Multiple implementations Each has expected error/time

avg avg avg avg min Function substitution

Multiple implementations Each has expected error/time avg avg avg avg

min Sampling inputs of reduction nodes Reductions consume fewer inputs avg avg

min Sampling inputs of reduction nodes Reductions consume fewer inputs Tradeoff Space Time Error Tradeoff Space Time Error Optimal Tradeoff Curve

Time Using the tradeoff curve: Minimize time subject to error bound Minimize error subject to time bound Error Our Result Original program Transformations Analysis Randomized computation Guaranteed expected error/time tradeoff -approximation of optimal tradeoff

Error bound Optimization Optimized program Outline Model of Computation Tradeoff Curve Construction Optimized Program Selection Related Work Model of Computation f

g f g f avg avg f

g avg avg t u t v w u min

v w g Model of Computation g f f g

f m avg g f avg avg f g

n t t t v u w w n min

min 1 v avg avg n u m avg

u v w g Structure of Computation Computation nodes g f m m avg

avg n n t u v w n min 1 DAGs of functions

Functions: arbitrary code Process individual inputs Reduction nodes Aggregation functions Average, min, max, sum Computation Tree Computation nodes and reduction nodes Accuracy-Aware Transformations Function substitution m m avg

avg n n Multiple versions Execute with probability Each has error/time spec Reduction sampling Consume inputs Probability of selecting each input: n Derived error/time specifications

min Average: 1 Min/max: Accuracy-Aware Transformations Function substitution m m avg avg

avg n n Multiple versions Execute with probability Each has error/time spec Reduction sampling Consume inputs Probability of selecting each input: n Derived error/time specifications

min Average: 1 Min/max: Accuracy-Aware Transformations Function substitution m m avg avg

avg n n Multiple versions Execute with probability Each has error/time spec Reduction sampling Consume inputs Probability of selecting each input: n Derived error/time specifications

min Average: 1 Min/max: Accuracy-Aware Transformations Function substitution m m avg avg

avg n n Multiple versions Execute with probability Each has error/time spec Reduction sampling Consume inputs Probability of selecting each input: n Derived error/time specifications

min Average: 1 Min/max: Program Configuration Vector Defines transformed program m Functions: probability of executing each version Reductions: number of elements to sample

m avg avg n n n min 1 0.3 0.6

0.1 Configuration Vector Specifies program version Functions: Findm optimal programprobability of m executing each version = Reductions: number of n

n Find configuration vector thattoachieves elements sample avg avg optimal accuracy vs. performance tradeoff n min 1 0.3 0.6

0.1 Tradeoff Curve Construction: Algorithm Divide and conquer m m avg avg n

n For each subcomputation construct tradeoff curve Dynamic programming Properties n min 1 Polynomial time -approximation of true tradeoff curve Tradeoff Curve Construction: Algorithm

m m avg avg n n n min 1 Tradeoff Curve Construction: Algorithm

m m avg avg n n n min 1 Tradeoff Curve Construction: Algorithm

m m avg avg n n n min 1 Tradeoff Curve Construction: Algorithm

m avg n n n min 1 Tradeoff Curve Construction: Algorithm n n

n min 1 Tradeoff Curve Construction: Algorithm n n n min 1 Tradeoff Curve Construction: Algorithm

n n n min 1 Tradeoff Curve Construction: Algorithm n min 1 Tradeoff Curve Construction: Algorithm

Computation Node Optimization Linear program Variables: Probability to execute each version of Range: Sum: (E0,T0) (E1,T1) (E2,T2) Computation Node Optimization Linear program Variables: Probability to execute each version of Range: Sum: Objective:

(E0,T0) (E1,T1) (E2,T2) Constraint: Computation Node Optimization Linear program Variables: Probability to execute each version of Range: Sum: Objective: (E0,T0) (E1,T1) (E2,T2) Constraint:

Computation Node Optimization Linear program Variables: Probability to execute each version of Range: Sum: Objective: (E0,T0) (E1,T1) (E2,T2) Constraint: The Algorithm: Reduction Nodes Given error bound , find number of elements to sample m

avg The Algorithm: Reduction Nodes Given error bound , find number of elements to sample ( , ) s.t. m avg The Algorithm: Reduction Nodes Given error bound , find number of elements to sample ( , ) s.t. m

avg From approximate tradeoff curve: for The Algorithm: Reduction Nodes Given error bound , find number of elements to sample ( , ) s.t. m avg Univariate optimization problem Analogously, minimize error subject to

Approximate Tradeoff Curve Bidimensional Discretization Time Take elements at regular intervals Error Approximating Tradeoff Curve Bidimensional Discretization Time Error Approximating Tradeoff Curve Time Error

Approximating Tradeoff Curve Randomized configuration: Time Execute with probability Execute with probability Error 1

Approximating Tradeoff Curve Time ^ ^ Error Approximating Tradeoff Curve Time

^ T ^ Error Properties of the Algorithm Performance Number of tradeoff curve points: Most expensive operation: bidimensional discretization Calling LP solver times

Each call can have variables Precision Precision decreases linearly with the number of nodes To obtain -approximation set intermediate Space Storing tradeoff curves: Obtaining Optimized Programs Tradeoff curves for all subcomputations: Each curve contains partial configuration Probability of executing local function nodes Number of inputs to sample from reduction node Error tolerated by subcomputation Distribution over optimal program configurations

Incrementally construct configuration vector: For every execution Traverse the tree, starting from root Time to get full vector: Related Work Accuracy-aware transformations Empirical justification: training/test input set [Rinard ICS 06, Rinard OOPSLA 07, Ansel et al. PLDI 09, Misailovic et al. ICSE 10, Baek & Chilimbi PLDI 10 Hoffmann et al. ASPLOS 11, Sidiroglou et al. FSE 11] Probabilistic accuracy analysis for loop perforation [Misailovic et al. SAS 11, Chaudhuri et al. FSE 11] Related Work Accuracy-aware transformations Empirical justification: training/test input set [Rinard ICS 06, Rinard OOPSLA 07, Ansel et al. PLDI 09, Misailovic et al. ICSE 10, Baek & Chilimbi PLDI 10

Hoffmann et al. ASPLOS 11, Sidiroglou et al. FSE 11] Probabilistic accuracy analysis for loop perforation [Misailovic et al. SAS 11, Chaudhuri et al. FSE 11] Ensuring safety of transformed programs Separating critical and approximate parts of program [Carbin & Rinard ISSTA 10, Sampson et al. PLDI 11] Verifying relaxed semantics of programs [Carbin et al. CSAIL-TR 11] Analytic properties of programs [Majumdar & Saha RTSS 09, Chaudhuri et al. POPL10, Ivancic et al. MEMOCODE 10 , Reed & Pierce ICFP 10, Chaudhuri & Solar-Lezama PLDI 10 , Chaudhuri et al. FSE 11] Summary Model of Computation Accuracy-aware program transformations

Effects on overall accuracy and execution time Explore and Exploit Optimal Tradeoffs Approximate optimal tradeoff curve construction Polynomial, dynamic programming algorithm Randomized program configurations to achieve tradeoffs Envisioned Applications Image and video processing, numerical algorithms, queries on big data sets, machine learning, Optimization, fault tolerance, dynamic adaptation