# Recursive Proofs for Inductive Tree Data-Structures Machine Learning based methods for Invariant Synthesis PRANAV GARG P. MADHUSUDAN UNIV. OF ILLINOIS AT URBANA-CHAMPAIGN ACKNOWLEDGEMENTS Daniel Neider (UIUC+UCLA) Dan Roth (UIUC) OVERVIEW 1. Invariant Synthesis and Machine Learning (1:15hr) Inductive synthesis of invariants using learning: iterative ICE model The ``gaps we need to bridge in adapting ML algorithms to invariant synthesis 2. Machine learning: Classification algorithms (1:00hr) Learning conjunctions, Linear Threshold Functions, Winnow,

Perceptron, SVM, Decision trees with Bool and continuous attribs Tool: Weka, C5.0 3. Building bridges from ML to Inv Synthesis Tool: ICE-DT (with Demo) 4. Open problems Broader perspective: Inv Synthesis to Program Synthesis (50min) (15m) 3 BRIDGING THE GAPS Gap#1: Ensuring consistent learning Gap#2: Convergence under iteration (online learning) Gap#3: ICE Machine Learning Invariant synthesis for program verification Machine learning is not necessarily a fuzzy theorem-less heuristic-filled

world of ad-hoc algorithms! At one end, there is a very strong theory (computational learning theory) with deep strong results. But (just like in our world), heuristics are needed to scale and perform well. 4 Building bridges to Inv Syn Classical Ensuring Iterative Machine consistent (+,-) Learning learning convergence Conjunctions Elimination Alg 1 Conjunctions(LTF) Winnow 1 1

Conjunctions(LTF) SVM 1 Boolean functions Decision trees Bool Threshold fns Decision trees w/ cont attribs 1 1 Passive learning ICE Iterative ICE convergence 2

2 Houdini Houdini 1 Open -- 1 1 Open -- 1 1 1

2 2 1 1 2 2 2 INTRODUCTION: INVARIANT SYNTHESIS AND MACHINE LEARNING - Invariants: why they are important to synthesize - Inductive synthesis approach to invariant synthesis: Motivation - Learning from (+,-) samples - Machine learning: What are they, classifiers, nomenclature and aims - The Iterative learning model for synthesizing invariants - Why (+,-) examples do not suffice - The ICE learning model

- The ``gaps between bridging ML algorithms to invariant synthesis + Consistency (removing fuzziness) + Convergence under iteration + Handling ICE + Convergence for ICE 6 How does one verify software? Inductive Invariant Init I Init I Bad Reach I pgm step ( I ) I Bad State space

Steps to verify software: Main bottleneck to 1. Specify an invariant. Manual automatic verification 2. Validate the invariant: - includes Init Simple properties Automatic - excludes Bad Complex properties Manual - is inductive 7/39 Software Verification 101 int i; void increment(int n) // Requires: (n >= 0); { i = 0; while (i < n) { i = i + 1; } return; } // Ensures: i == n;

Step 1. Specify an invariant // Loop invariant: i <= n Step2. Validate the invariant: 1. Includes Init: n 0 i 0 i n 2. Excludes Bad: ( i n i n i n) ' ( i n i i 1) i ' n 3. Is inductive: i n 8/39 FLOYD-HOARE STYLE REASONING User provides specification using modular annotations (pre/post conditions, class invariants, loop invariants)

Automatic generation of verification conditions (e.g., Boogie) (pure logic formulas whose validity needs to be verified) Example: [|x > y|] x:=x+2; y:=y+1; [|x > y|] gives verification condition "xold, xnew, yold, ynew : ( (xold > yold xnew xold 2 ynew yold 1) xnew > ynew ) Validity of verification conditions checked mostly automatically (e.g., SMT solvers) CHALLENGES OF FH REASONING: BURDEN OF ANNOTATIONS Pre/post conditions for logical macro-modules

- Natural to specify; absolutely essential for modular verification! - This is the specification at the modular level Pre/post conditions for smaller modules Harder to write/specify Synthesize Loop Invariants Much harder to write awkward as well, with border conditions Programmer shouldnt have to write most of them Synthesize Proof tactics Exremely hard to write, even by experts Intimate knowledge of underlying platform

Synthesize VISION With automatic synthesis of annotations and proof tactics we can provide programmers with tools adequate to build reliable/secure systems with reasonable overhead. This will enable a new class of reliable software to be built that hasnt been possible so far. You shouldnt need a PhD in formal methods to use these tools. We can, say over a semester course, be able to enable programmers to write reliable code with proven properties. EXPRESSOS: A SECURE FOUNDATION FOR MOBILE APPS ISOLATION (15K LOC) ExpressOS fragment: Cache of DiscPages class CachePageHolder { // this.Head is a sorted list of CachePages CachePage Head; // Class invariant: sorted(this.Head); . . . CachePage LookupPrev(int k) { // Ensures: (ret != NULL => ret.Key <= k); // Ensures: (ret != NULL && ret.Next != NULL) =>

ret.Next.Key > k; CachePage current = Head, prev = null; while (current != null && current.Key <= k) { prev = current; current = current.Next; } return prev; } . . . } ExpressOS fragment: Cache of DiscPages class CachePageHolder { // this.Head is a sorted list of CachePages CachePage Head; // Class invariant: sorted(this.Head); . . . CachePage LookupPrev(int k) { // Ensures: (ret != NULL => ret.Key <= k); // Ensures: (ret != NULL && ret.Next != NULL) => ret.Next.Key > k; CachePage current = Head, prev = null; while (current != null && current.Key <= k) { prev = current; current = current.Next; } return prev; } . . . }

ExpressOS fragment: Cache of DiscPages class CachePageHolder { // this.Head is a sorted list of CachePages CachePage Head; // Class invariant: sorted(this.Head); . . . CachePage LookupPrev(int k) { // Ensures: (ret != NULL => ret.Key <= k); // Ensures: (ret != NULL && ret.Next != NULL) => ret.Next.Key > k; CachePage current = Head, prev = null; while (current != null && current.Key <= k) { prev = current; current = current.Next; } return prev; } . . . } // Loop invariant: sorted(this.Head) && ((prev == null && Head ->* current) || (Head ->* prev && current == prev.Next && prev.Key <= k)); INDUCTIVE INVARIANT SYNTHESIS White-box approaches:

Fixed-point in an abstract domain Farkas lemma Interpolation IC3 White-box approaches (more recently) - Idea: Learn invariant from sample configurations SYNTHESIS USING LEARNING Problem : Find an adequate inductive loop invariant Inv for a loop Technique: Ignore the program! Extract some sample configurations that characterize Inv . Try to synthesize some formula that satisfies these samples. If it doesnt work, get a new sample and continue. Extract sample configs

Program Learning Algorithm Inv G. Renee Guzlas WHACKY IDEA? Black-box invariant synthesis: Certainly an unusual idea, to hide the program from synthesizer. Motivation: - program can be complex: complex memory model, complex semantics, pointers, heaps, etc. user-written annotations can also be complex - inductive learning ignores this complexity, learning only over a restricted observation of the state of the program 18 INDUCTIVE SYNTHESIS Invariant synthesis: Houdini, ICE-learning (stochastic search [Sharma et al 2014], constraint solving, automata [Garg et al 2014]

machine learning [Garg et al 2015] ) More general synthesis (program synthesis, completing Sketches, etc.) Black-box synthesis is the norm! CEGIS (Counter-example guided inductive synthesis) SyGuS synthesis competition: all solvers are based on CEGIS: Enumerative, stochastic, symbolic, machine learning-based (Alchemist) enumerative search using abstract recursive data-types (CVC4) . more on this at the end of this tutorial 19 MACHINE LEARNING Traditional Algorithms: Task that we need the program to do is clear, precise. We build algorithms that do the task, efficiently. Machine Learning Algorithms: Algorithms that automatically improve on (possibly ill-defined) tasks through experience. Improves with experience at some task T - with respect to some performance measure P - based on experience E

E.g.: Spam filters, Recognizing faces in a crowd, Credit fraud, Finding the movie to recommend to you from your viewing history 20 MACHINE LEARNING Instance space x Training sample T Given f Y g ``predict Most important: how well works on elements not in T or how well f generalizes T Formalizations: PAC, etc. but not useful for us. 21

OVERFITTING AND GENERALIZATION Given ``predict An that maps T correctly but maps other points to 0 overfits! In order to generalize, may not even map T correctly. 22 MACHINE LEARNING Note: Set of features : H induces bias H s S Restricting to H gives some generalization

Sample Learner : 23 LEARNING BOOLEAN FUNCTIONS Instance space x Training sample T f Y g Boolean functions over , 24 LEARNING BOOLEAN FORMULAS Concept space: All Boolean functions over Hypothesis space: - Conjunctions over - Disjunctions over

- k-CNF (or k-DNF) over - Linear Threshold Functions An arithmetization of Boolean functions - Arbitrary Boolean formulas over 25 OVERFITTING AND GENERALIZATION Samples : ,, } where each is a valuation of and each Let hypotheses space be all Boolean functions. Then the function f that says if ( ( ( else return True; overfits the sample. How do we avoid overfitting? - Choose a more constrained hypothesis space (e.g., conjuncts) - Occams razor: learner must find a small function. 26

SOME MACHINE LEARNING ALGORITHMS FOR LEARNING BOOLEAN FUNCTIONS HYPOTHESIS SPACE LEARNING ALGORITHM Conjuncts Elimination Algorithm (prefers LARGE conjuncts) Linear Threshold Functions SVM (margin bias) Winnow, Arbitrary Boolean Formulas (bias towards small trees) Decision tree learning 27 BRIDGING ML AND INV SYNTHESIS Issue #1: ML algorithms need not return hypotheses consistent with (training) sample.

This is a feature! Not fitting the sample is a means to generalization. In invariant synthesis, not having a hypothesis consistent with the sample is awkward. Need to address this difference in some way. 28 PASSIVE LEARNING ISNT ENOUGH Extract sample configs Labeled sample set s1Learning :+ Inv Algorithm Program + Spec Verification Oracle s2: s3: + Program + Spec Learning Verifier Algorithm checks hypothesis

Invariant Hypothesis H? Iterative learning: the learner and teacher work in rounds, If hypothesis is not a valid invariant, the teacher returns a counterexample program configuration appropriately labeled 29 PASSIVE LEARNING ISNT ENOUGH Labeled sample set Passive Learning s1: + Verification Oracle Algorithm s2: s3: + Program + Spec Learning Verifier Algorithm checks hypothesis Invariant Hypothesis H? Not really a problem. We can plug in any passive machine learning algorithm for the learning algorithm.

But will this iteration CONVERGE? 30 ITERATIVE LEARNING Issue #2: When we plug in machine learning algorithms in an iterative setting with a verification oracle, will the learning converge? More simply, assume the teacher knows a UNIQUE invariant and gives counterexamples with respect to it. Then will the learner eventually learn this invariant? 31 ONLINE LEARNING MODEL Online machine learning model: closest to iterative model Samples arrive in a stream. Learner continually proposes hypothesis. E.g.: Stock-market prediction Some of the algorithms we study have online-learning analogs and analysis (SVMs, Winnow, etc.) 32 GENERIC MISTAKE BOUND ALGORITHMS

Is it clear that we can bound the number of mistakes ? Let C be a finite concept class. Learn CON: In the ith stage of the algorithm: all concepts in C consistent with all i-1 previously seen examples Choose arbitrary f 2 Ci and use to predict the next example Clearly, Ci+1 Ci and, if a mistake is made on the ith example, then |Ci+1| < |Ci| so progress is made. The CON algorithm makes at most |C|-1 mistakes Can we do better ? 33 THE HALVING ALGORITHM Let C be a concept class. Learn f C Halving: In the ith stage of the algorithm:

: all concepts in C consistent with all i-1 previously seen examples Given an example for all consider the value and predict by majority. Predict 1 iff | { f j C i ; f j (ei ) 0} || { f j C i ; f j (ei ) 1} | The Halving algorithm makes at most log(|C|) mistakes 34 THE HALVING ALGORITHM Hard to compute In some cases Halving is optimal (C - class of all Boolean functions) In general, not necessarily optimal! to be optimal, instead of guessing in accordance with the majority of the valid concepts, we should guess according to the concept group that gives the least number of expected mistakes (even harder to compute)

35 POSITIVE / NEGATIVE LABELS ARE NOT SUFFICIENT In reality, the teacher does not have ONE invariant. The teacher knows NO invariants. And yet there may be many. The teacher knows only the properties an invariant must satisfy: 1. Includes Init: 2. Excludes Bad: 3. Is inductive: It turns out that the teacher cannot give only + or -- labeled counterexamples! 36 Counterexample: Positive H Good Bad New positive counter-example Good H

37/39 Counter-example: Negative New negative counter-example Good Bad H H Bad 38/39 What happens if the invariant is not inductive? H Good s1 s1 H and s2

Bad s2 H Dont know whether these states should be included in the invariant or not. All we know: s1 H s2 H 39/39 ICE: Learning invariants using Implication Counter-Examples ([Garg et al. CAV 14]) ICE sample set Verification Oracle Program + Spec Verifier checks hypothesis s1 s2 s3 s4 s5 s6 s7: + s8: s9: +

Learner Invariant Hypothesis H? New learning model for robust synthesis of invariants 40/39 Learning an invariant given an ICE sample Variable x2 x1 -10 x1 x2 -2 x1 x2 -2 20 0 -20

x1 -10 -40 -60 -40 -20 0 Variable x1 Program State Space 41/39 ICE LEARNING Issue #3: Traditional machine learning algorithms learn from (+,-) examples. How do we extend them so that they can learn from ICE samples? Issue #4: And how do we make sure these ICE learning algorithms converge in an invariant eventually? 42

CHALLENGES TO USING MACHINE LEARNING FOR INVARIANT SYNTHESIS 1. Removing fuzziness for machine learning algorithms How do we make them learn hypotheses consistent with the sample? 2. Will they converge, even with (+,-) counterexamples, to any target concept? 3. How can we adapt them to handle implication counterexamples (ICE)? 4. How do we make these ICE learning algorithms converge? 43 Meeting the challenges Classical Ensuring

Iterative Machine consistent (+,-) Learning learning convergence Conjunctions Elimination Alg 1 Conjunctions(LTF) Winnow 1 1 Conjunctions(LTF) SVM 1 Boolean functions Decision trees Bool Threshold fns

Decision trees w/ cont attribs 1 1 Passive learning ICE Iterative ICE convergence 2 2 Houdini Houdini 1 Open

-- 1 1 Open -- 1 1 1 2 2 1 1 2

2 2 OVERVIEW 1. Introduction: Invariant Synthesis and Machine Learning (40m) Inductive synthesis of invariants using learning: iterative ICE model The ``gaps we need to bridge in adapting ML algorithms to invariant synthesis 2. Machine learning: Classification algorithms (1:10hr) Learning conjunctions, Linear Threshold Functions, Winnow, Perceptron, SVM, Decision trees with Bool and continuous attribs Tool: Weka, C5.0 3. Building bridges from ML to Inv Synthesis Tool: ICE-DT (with Demo) 4. Open problems Broader perspective: Inv Synthesis to Program Synthesis (1:10hr)

(20m) 45 LEARNING CONJUNCTIONS Concept space: All Boolean functions over Hypothesis space: Conjunctions of atoms in Note: There are such conjuncts Very simple algorithm: Given a sample Return the conjunction consisting of every such that no positive sample has LEARNING CONJUNCTIONS N Given sample P = ((1,1,1,1,1),+), (1,1,0,1,0), +) N= ((0,0,0,0,0),-), ((1,0,0,1,0),-) P return Why? This is semantically the smallest set that excludes

includes all positive points. It will exclude the negative points (provided some conjunction satisfies the sample). So, if it doesnt satisfy N, we can abort.. ITERATIVE SETTING (+,-) Initially, when sample is empty propose In each round, learner will propose the semantically smallest set that contains the positive points. Hence teacher can never give a negative example! Whenever teacher returns (p, +), ``knock off all conjuncts such that p[ Converges in O(n) rounds! (learning from concepts) [This is the algorithm Daikon uses.] LEARNING DISJUNCTIONS We can also learn disjunctions. To learn we can learn instead its negation: Complement the samples; treat Boolean variables as negated.

Once you learn the formula, negate it. Caveat: Assumption that is logical. K-CNF AND K-DNF For a (small) fixed k, we can also learn k-CNF formulae: - Introduce new variables for each possible disjunction of k variables ( blowup) - Compute samples accordingly. - Learn a conjunction of these variables. Similarly, k-DNF. 50 Meeting the challenges Classical Ensuring Iterative Machine consistent (+,-) Learning learning convergence Conjunctions Elimination Alg

Conjunctions(LTF) Winnow 1 Conjunctions(LTF) SVM Passive learning ICE Iterative ICE convergence 2 2 Houdini Houdini

1 1 Open -- 1 1 1 Open -- Boolean functions Decision trees 1 1

1 2 2 Bool Threshold fns Decision trees 1 1 2 2 2 w/ cont attribs LINEAR THRESHOLD FUNCTIONS Widely used in machine learning. Let features be reals

Learning functions from Hypothesis space: where Example: 3.5 LINEAR THRESHOLD FUNCTIONS Linear threshold functions can express Boolean functions. < T, F, F, T, F > < 1, 0, 0, 1, 0 > then represents a Boolean function. Disjunctions: Represented by LTF: + 1. At least two of , must be true: Represented by LTF: + 1. Cannot express XOR. Cannot express nontrivial DNF. LINEAR THRESHOLD FUNCTIONS Linear threshold functions can hence express some class of Boolean functions.

includes disjunctions, and several other functions. LTFs make the discrete Boolean function space more continuous, allowing for more robust learning algorithms in some cases. WINNOW ALGORITHM Assume n is very large. But that we are learning a set of at most r disjuncts r << n Iterative Elimination algorithm: Could take O(n) rounds. Can we do better? Yes. O(r log n) rounds! [Littlestone] In invariant synthesis applications, very interesting, since we can throw in a large number of predicates, though invariant will be a small set of conjuncts. 55 WINNOW ALGORITHM r << n Lets iteratively learn a linear threshold function.

Initialize each If we get a positive counterexample then for every such that set If we get a negative counterexample then for every such that set Beautiful proof showing that this converges in O(r log n) rounds if a target concept has r conjuncts only. Note: O( 56 WINNOW ALGORITHM Theorem: The Winnow learns the class of disjunctions with mistake bound of , when the target concept f is an OR of r variables. Proof: Assume target is a disjunct of elements in T. - Mistakes on positive examples:

= On any positive example, we will double wt of least one element in T. And a negative example cannot reduce wt of any element in T. Cant make positive mistake when at least one weight of T is So #post-mistakes is at most r(log n). - Mistakes on negative examples: Consider W = Each positive mistake increases W by at most n. (Since is what caused example to be negative.) Each negative mistake decreases W by at least n/2. Some simple math gives #neg-mistakes < 2#pos-mistakes +2 So total #-mistakes 57 WINNOW ALGORITHM FOR ICE? Open Problem: Is there an iterative learning algorithm that can learn target functions with r conjuncts over n Boolean variables in time O(r log n)? Or poly in r, and sublinear in n.

58 PERCEPTRON ALGORITHM Iteratively learn a linear threshold function. Update rule is additive: If we get a positive counterexample then for every such that set If we get a negative counterexample then for every such that set Mistake bounds exists; depend on the margin [Novikoff63]. 59 Meeting the challenges Classical Ensuring Iterative Machine consistent (+,-) Learning learning

convergence Passive learning ICE Iterative ICE convergence Conjunctions(LTF) SVM 1 1 1

Open -- Boolean functions Decision trees 1 1 1 2 2 Bool Threshold fns Decision trees 1 1

2 2 2 Conjunctions Elimination Alg Conjunctions(LTF) Winnow w/ cont attribs 2 2 Houdini Houdini Open --

AND NOW TO PRANAV 61 Meeting the challenges Classical Ensuring Iterative Machine consistent (+,-) Learning learning convergence Passive learning ICE Iterative ICE convergence 2 2 Houdini Houdini

Open -- Conjunctions(LTF) SVM 1 1 1 Open --

Boolean functions Decision trees 1 1 1 2 2 Bool Threshold fns Decision trees 1 1 2 2

2 Conjunctions Elimination Alg Conjunctions(LTF) Winnow w/ cont attribs Support Vector Machines Learn n w x b 0, where x {0,1} i i i

1i n represents a Boolean function 63 Support Vector Machines Learn n w x b 0, where x {0,1} i i i

1i n represents a Boolean function 64 Support vector machines: pick the one with the largest margin! 65 Parameterizing the decision boundary 66 How do we compute the margin? w x1 wT.( x1 + || w || . || ) + b = a ; (wT. x1 + b) + wT. || = | a |

= a; margin = | a | / || w || 67 Support Vector Machines Labels: y i {1,-1} a 1 max w ,b || w || T 1 where, (w x i b) y i a 68 Support Vector Machines Labels: y i {1,-1}

1 max w ,b || w || T where, (w x i b) y i 1 69 Support Vector Machines Labels: y i {1,-1} min || w || 2 w ,b T where, (w x i b) y i 1

Quadratic programming (QP) problem: QP solvers gradient descent 70 Consistency and Iterative Convergence If Boolean formula is expressible as a linear threshold function, the concept (LTF) learned is consistent with the sample. Extensions to soft-margin SVMs Convergence There are a finite number of Boolean formulas Current hypothesis is (semantically) different from all previous hypotheses. Sample contains at least one counterexample for every previous hypothesis Above learning algorithm is convergent in the iterative setting 71

Support Vector Machines for ICE? Open Problem: How do you define margins? 72 Meeting the challenges Classical Ensuring Machine Learning consisten t learning Iterative (+,-) convergence Passive learning ICE samples Iterative ICE convergenc

e Conjunctions(LTF) SVM Open -- Boolean functions Decision trees

1 1 1 2 2 Bool Threshold fns Decision trees 1 1 2 2 2 Conjunctions

Elimination Alg Conjunctions(LTF) Winnow w/ cont 2 2 Houdini Houdini Open -- If the sample is not linearly separable Any linear classifier would introduce errors on the sample To deal with possible inconsistencies: Move to a more expressive feature space Learn over a more expressive hypothesis space

74 More expressive feature space x1 x2 x4 V x2 x4 x5 V x1 x3 x7 y3 V y4 V y7 Space X: x1, x2, x3, New Space Y: {xi, xi xj, xi xj xk, } 75 More expressive feature space x1 x2 x4 V x2 x4 x5 V x1 x3 x7 y3 V y4 V y7 Space X: x1, x2, x3, New Space Y: {xi, xi xj, xi xj xk, } 76 More expressive hypothesis space Bn = {b1, b2, , bn} Concept space: All Boolean functions over Bn Hypothesis space: All Boolean functions over Bn

n Note: There are O(2 ) such Boolean formulas 2 77 Decision Trees Hierarchical data that expresses arbitrary Boolean functions Outlook Sunny Overcast Humidity High Normal No Yes Yes Rainy Wind

Strong Weak No Example: Decision Tree for play tennis? Yes Each internal node is labeled with an (discrete) attribute Each branch corresponds to a value for the attribute Each leaf node is labeled with a classification 78 Will my friend play tennis today? Features: Outlook: { Sunny, Overcast, Rainy } Temperature: { Hot, Mild, Cold } Humidity: { High, Normal }

Wind: { Strong, Weak } Classification: Binary classification (play tennis?): { Yes, No } 79 Will my friend play tennis today? Outlook Temperature Humidity Wind Tennis? Sunny Hot High Weak No Sunny Hot High Strong No Overcast Hot High Weak

Yes Rainy Mild High Weak Yes Rainy Cold Normal Weak Yes Rainy Cold Normal Strong No Overcast Cold Normal Strong Yes Sunny Mild Normal Weak Yes Outlook Sunny Overcast Humidity

High Normal No Yes Yes Rainy Wind Strong Weak No Yes 80 Learning decision trees: ID3 [Quinlan 1986] procedure ID3(Sample = , Attributes) { Create a root node of the tree if all examples are positive or all negative then Return a single node tree root with label + or else Select a good decision attribute A for Sample

Label the root of the tree with this attribute Divide the Sample into two sets: SampleA and Sample A Return tree with root node and left tree ID3(Sample , Attributes) and right tree A A ID3(Sample , Attributes) } 81 Learning decision trees: ID3 [Quinlan 1986] Polynomial-time algorithm No backtracking Linear in the size of the output tree and the sample

Pick decision attributes in a good way Important since we do not backtrack Inductive bias: smaller trees Note: Even though picking good decision attributes is fuzzy, the learning algorithm by itself is not fuzzy. Bad ways of dividing the sample just leads to larger trees 82 Picking a good decision attribute [100+,100-] T A=? [100+,100-] F

T B=? F [100+,0-] [0+,100-] [50+,50-] [50+,50-] 83 Picking a good decision attribute A=? [100+,100-] [100+,100-] F T

T [20+,80-] [80+,20-] [70+,30-] B=? F [30+,70-] [53+,40-] T [23+,17-] A=? F [30+,23-] 84

Entropy S is a sample of training examples p/p is the proportion of positive/negative examples in S Entropy measures the impurity of S: Entropy(S) = - p log2 p - p log2 p 85 Information Gain Gain(S, A) is expected reduction in entropy on partitioning the sample S with respect to A | Sv | Gain(S, A) Entropy(S) Entropy(S v ) vvalues(A) | S | [100+,100-] 1.0

T A=? F [100+,100-] 1.0 T B=? F [30+,70-] [20+,80-] [70+,30-] [80+,20-] 0.88 0.88 0.72 0.72 Gain(S, A) = 1.0 (100/200*0.72 + 100/200*0.72) = 0.28 Gain(S, B) = 1.0 (100/200*0.88 + 100/200*0.88) = 0.12 86 Demo

Comparing Information gain heuristic with a random splitting criteria 87 Consistency and Iterative Convergence Decision tree learned is consistent with the sample Disable default settings related to: Pruning: MDL, not further partitioning a sub-sample with few examples Convergence There are a finite number of Boolean formulas Current hypothesis is (semantically) different from all previous hypotheses. Sample contains counterexamples to all previous hypotheses Decision tree learner is convergent in the iterative setting 88 Meeting the challenges

Classical Ensuring Machine Learning consisten t learning Conjunctions Elimination Alg Conjunctions(LTF) Winnow Conjunctions(LTF) SVM Boolean functions Decision trees Bool Threshold fns Decision trees w/ cont Iterative (+,-) convergence Passive learning ICE samples

Iterative ICE convergenc e Open --

2 2 1 1 2 2 2 2 2 Houdini Houdini Open --

Decision Trees over Continuous attributes Sn = {b1, b2, , bn , v1, v2, , vm} where, bi are over {True, False}, vi are over Z Concept space: All functions from Sn to {True, False} Hypothesis space: All Boolean functions over predicates: bi and vi c, forc Z. Note: There are infinitely many formulas 90 Exam-1 Exam-2 5 10 15 25 40

50 55 60 70 75 80 85 95 95 15 80 45 25 68 10 90 45 10 45 80 10 25 55 Passed/

Failed? No Yes No No No No Yes No No Yes Yes Yes Yes Yes Exam-2 Decision Trees over Continuous attributes Exam-1 91 Exam-1

Exam-2 5 10 15 25 40 50 55 60 70 75 80 85 95 95 15 80 45 25 68 10 90 45 10

45 80 10 25 55 Passed/ Failed? No Yes No No No No Yes No No Yes Yes Yes Yes Yes Exam-2 Decision Trees over Continuous attributes

Exam-1 For thresholds c, consider only the values in the sample 92 Exam-1 Exam-2 5 10 15 25 40 50 55 60 70 75 80 85 95 95

15 80 45 25 68 10 90 45 10 45 80 10 25 55 Passed/ Failed? No Yes No No No No Yes No

No Yes Yes Yes Yes Yes Exam-2 Decision Trees over Continuous attributes Exam-1 70 93 Decision Trees over Continuous attributes Exam-2 5 10 15 25 40 50

55 60 70 75 80 85 95 95 15 80 45 25 68 10 90 45 10 45 80 10 25 55 Passed/ Failed?

No Yes No No No No Yes No No Yes Yes Yes Yes Yes 68 Exam-2 Exam-1 Exam-1 70 94

Decision Trees over Continuous attributes Exam1 70 Exam2 68 yes no Fail Pass 68 no Pass Exam-2 yes Exam-1 70

Decision tree learned is consistent with the sample (no pruning, ) Iterative convergence: hard to build (described later) 95 Meeting the challenges Classical Ensuring Machine Learning consisten t learning Conjunctions Elimination Alg Conjunctions(LTF) Winnow Conjunctions(LTF) SVM Boolean functions Decision trees Bool Threshold fns Decision trees

Iterative (+,-) convergence Passive learning ICE samples Iterative ICE convergenc e

Open 2 2 2 2 2 2 2 Houdini Houdini

Open --- CHALLENGES TO USING MACHINE LEARNING FOR INVARIANT SYNTHESIS 1. Removing fuzziness for machine learning algorithms How do we make them learn hypotheses consistent with the sample? 2. Will they converge, even with (+,-) counterexamples, to any target concept? 3. How can we adapt them to handle implication counterexamples (ICE)? 4. How do we make these ICE learning algorithms converge?

97 DEMO FOR WEKA 98 Meeting the challenges Classical Ensuring Machine Learning consisten t learning Conjunctions Elimination Alg Conjunctions(LTF) Winnow Conjunctions(LTF) SVM Boolean functions Decision trees Bool Threshold fns Decision trees w/ cont Iterative

(+,-) convergence Passive learning ICE samples Iterative ICE convergenc e

Open -- 2 2 2 2 2 2 2 Houdini

Houdini Open -- ICE: Learning invariants using Implication Counter-Examples ([Garg et al. CAV 14]) ICE sample set Verification Oracle Program + Spec Verifier checks hypothesis s1 s2 s3 s4 s5 s6 s7: + s8: s9: + Learner Invariant

Hypothesis H? 100 Extending Elimination algorithm to ICE Bn = {b1, b2, , bn} Concept space: All Boolean functions over Bn Hypothesis space: Conjuncts of atoms over Bn Main idea: Return the smallest set that includes all positive points and is consistent with implication counterexamples 101 Extending Elimination algorithm to ICE Input: sample { (p1, +), , (pu, +), (n1, -), , (nv, -), (s1, t1), , (sw, tw) } Algorithm: : b i s.t. for no positive example (p, +) is p(b ) = F i while( is not consistent with all implications) { Pick an implication (s, t) inconsistent with , i.e., s | and t |

Update to remove all atoms bi where t(bi) = F } excludes the negative points (provided some conjunction satisfies the sample) 102 Convergence in iterative ICE Initially when the sample is empty, propose b1 b 2 b n In each round, learner will propose the semantically smallest set that contains positive and implication counterexamples Hence, teacher can never give a negative counterexample Whenever, teacher returns (p, +) or (s, t) knock off all conjuncts bi for which p(bi) = F or t(bi) = F Converges in O(n) rounds! (is precisely the Houdini algorithm that learns invariants which are conjunctions over atomic predicates) 103 Meeting the challenges

Classical Ensuring Machine Learning consisten t learning Conjunctions Elimination Alg Iterative (+,-) convergence Passive learning ICE samples Iterative ICE convergenc e

Houdini Houdini -- Conjunctions(LTF) Winnow Open Conjunctions(LTF) SVM

Open -- 2A 2A 2A 2A 2A Boolean functions Decision trees

Bool Threshold fns Decision trees Adapting decision tree learning to ICE Attribute x2 x1 -10 x1 x2 -2 x1 x2 -2 20 0 -20 x1 -10 -40 -60

-40 -20 0 Attribute x1 105 Learning decision trees given an ICE sample Attribute x1 10 x2 5 x1 x2 2 106 [Garg et al. UIUC-TR-2015] Theorem: The decision-tree based learning algorithm always produces a decision tree that is consistent with the input ICE sample. Valid Sample:

No point is classified as and s2 : For an implication, s1 label(s1) = label(s2) = label(s2) = label(s1) = Lemma: In a valid sample if, we label a subset of implication endpoints as purely positive (or purely negative), it results in classification that is consistent with the ICE sample. 107 Ignoring implications while picking the decision attribute is not a good idea 108 Picking a good decision attribute

S1 S2 Measure 1: Penalize implications cut by attribute InfoGain ICE InfoGain classical Penalty ( S1 , S 2 , Implicatio ns ) 109 Picking a good decision attribute S1 S2 Measure 1: Penalize implications cut by attribute InfoGain ICE InfoGain classical Penalty ( S1 , S 2 , Implicatio ns ) | S1Pos | | S 2Neg | | S 2Pos | | S1Neg | Penalty ( S1 , S 2 , Implicatio ns) . . | I S1 S 2 | . . | I S 2 S1 | | S1 | | S 2 |

| S 2 | | S1 | 110 Picking a good decision attribute Measure 2: Extends Shannon entropy and Information Gain to implications using probabilistic dependencies | S1 | | S2 | InfoGain classical Entropy(S) Entropy(S 1 ) Entropy(S 2 ) |S| |S| Entropy(S) P(S,) log 2 P(S, ) P(S, ) log 2 P(S, ) P(S, ) 1 |S| 1 | S |

xS P(x ) P(x ) P(x ) P(x 1 ) P(x 2 ) xPos xNeg (x , x ) Pos 1 2 1 0

P(S,) expressed as a conditional probability 111 Picking a good decision attribute Measure 2: Extends Shannon entropy and Information Gain to implications using probabilistic dependencies | S1 | | S2 | InfoGain classical Entropy(S) Entropy(S 1 ) Entropy(S 2 ) |S| |S| Entropy(S) P(S,) log 2 P(S, ) P(S, ) log 2 P(S, ) P(S,) is the root of the following quadratic equation over t: it 2 (p n i)t p 0 where: p, n, i is the number of positive points, negative points and implications in S

112 Meeting the challenges Classical Ensuring Machine cosistent Learning learning Conjunctions Elimination Alg Iterative (+,-) convergence Passive learning ICE samples Iterative ICE convergenc

e Houdini Houdini Conjunctions(LTF) Winnow 1A 1A 1A Open -- Conjunctions(LTF) SVM

Open -- 2A 2A Boolean functions

Decision trees Bool Threshold fns Decision trees w/ cont Building a convergent decision tree learner for Positive/Negative samples Recall: Hypothesis space: All Boolean functions over predicates: bi and vi c,for c Z, where bi are over {True, False} and vi are over Z. Note: There are infinitely many formulas 114 Building a convergent decision tree learner for Positive/Negative samples Restricting thresholds c Z to values such that | c | < m

Boolean functions over predicates bi and vi c for | c | < m, c m Z, + Z is finite. Main loop: If possible, then construct a consistent decision tree with thresholds at most m. Otherwise, increment m. 115 Building a convergent decision tree learner for Positive/Negative samples ID3

If ID3

combination of attributes with arbitrary inequalities, then the above decision-tree based learning algorithm will always 117 eventually learn f. Building a convergent decision tree learner for ICE The above algorithm does not work for ICE! and examples cannot be separated by any predicate with thresholds at most m. Might be implication end-points labeled as by the learning algorithm Back-tracking is hard to control! / 118 Building a convergent decision tree learner

for ICE Equivalence relation ~m: s1 ~m s2 if there is no predicate with thresholds at most m that separate s1 and s2. s1 s2 119 Building a convergent decision tree learner for ICE Theorem: The above decision tree ICE learning algorithm converges to an invariant, if an invariant expressible as a Boolean combination of attributes with arbitrary inequalities exists. 120 Meeting the challenges Classical Ensuring Machine cosistent Learning learning Conjunctions Elimination Alg

Iterative (+,-) convergence Passive learning ICE samples Iterative ICE convergenc e Houdini Houdini

-- Conjunctions(LTF) Winnow Open Conjunctions(LTF) SVM

Open -- Boolean functions Decision trees Bool Threshold fns Decision trees w/ cont ICE-DT: System Architecture Boogie

Z3 (SMT solver) Verification Oracle ICE sample Invariant generation using ICE algorithm Invariant H? for decision trees Learner Attributes for the learner: Octagonal attributes, x y for every integer variable x, y Can manually specify additional attributes, such as 2 2 x y non-linear terms, eg., 122/39 Invariant synthesis using ICE-DT

Invariant synthesis for: More than 50 programs from various sources: Up to 100 LOC 5 - 15 variables 50 - 200 attributes Won the SyGus competition on the invariant-synthesis track (vs CVC4, enumeration, stochastic search, sketch, ) 123 Some Invariants Learned (2a > t 2) ((t (a 1) 2 1 2a t 1 2a > 0 n > 0 s (a 1) 2 ) (t (a 1) 2 s > 0 s 1)) (a  m) (m a j k j 0) y n n x y n x y ( s 2 y 2 0) ( s 2 y 2 > 0 s j. y s j. y j x)

(0 y y z ) ( z c 4572) ( N 0) (0 x 0 m m N 1) (m y 1 y x x n) ( y m) Invariants involve: - Arbitrary Boolean functions (up to 50 atomic predicates) - Large constants - Non-linear terms 124 Results Program ICE-CS [CAV-14] ICE-DT [UIUC-TR-15] afnp multiply 3.6 0.5 59.6

square arrayinv1 0.4 13.1 inc loops dillig12 1.7 0.2 3.9 0.5 3.4 dillig28 dtuc

0.2 0.7 0.5 0.8 Aggregate 50/58 programs 58/58 programs 125 Comparison to white-box invariant synthesis Invariant Synthesis using interpolation [CPAChecker] ICE Decision Tree Learner 41/58 programs

55/58 programs CPAchecker does not restrict itself to invariants over octagonal attributes White-box invariant synthesis is hard: Invariants over non-linear terms Synthesizing invariants in partially annotated programs (eg., Houdini) 126 ICE-DT Demo Boogie Z3 (SMT solver) Verification Oracle

ICE sample Invariant generation using ICE algorithm Invariant H? for decision trees Learner Learner is available as a separate module and can be combined with other verification engines. 127 CHALLENGES TO USING MACHINE LEARNING FOR INVARIANT SYNTHESIS 1. Removing fuzziness for machine learning algorithms How do we make them learn hypotheses consistent with the sample? 2. Will they converge, even with (+,-) counterexamples, to any target concept? 3.

How can we adapt them to handle implication counterexamples (ICE)? 4. How do we make these ICE learning algorithms converge? 128 Over to Madhu 129 Meeting the challenges Classical Ensuring Iterative Machine consistent (+,-) Learning learning convergence Conjunctions Elimination Alg Conjunctions(LTF) Winnow Conjunctions(LTF) SVM Boolean functions

Decision trees Bool Threshold fns Decision trees w/ cont attribs Passive learning ICE Iterative ICE convergence Houdini Houdini Open ---

Open

SOME OPEN PROBLEMS Winnow for ICE, with linear in r and sublinear in n convergence SVM for ICE We have only scratched the surface: Boolean + Thresholds ( Can we derive Boolean combinations of linear threshold functions? E.g., ( 131 SOME OPEN PROBLEMS Black-box learning of invariants in more complex settings: Quantified array invariants Array property fragment / e-matching / Strand Heap invariants Separation logic / decidable fragments / natural proofs

Applications: Verifying GPU kernels for race freedom (GPUVerify) Memory safety 132 FROM INVARIANT SYNTHESIS TO EXPRESSION SYNTHESIS In program synthesis, inductive synthesis is the norm! Expression synthesis: Find expression such that CEGIS loop: while (true) { - Propose an that works every val in S - Oracle checks whether holds. If yes, done, HALT. If not, finds a new counterexample valuation for. - Add this counterexample to S. } 133 S Sample Teacher:

H: s Learner Example: Invariant synthesis - Includes Init (+ve sample) - Excludes Bad (-ve sample) - Inductive (implication sample) Teacher knows Properties of target: Gives samples that show H does not satisfy S Sample Teacher: H:

s Learner Example: Sygus - : grounded formulas Teacher knows SyGuS solvers work with such samples: - Enumerative Question: Can we adapt - Stochastic machine learning algorithms - Symbolic to work on such samples? - Sketch Alchemist [CAV15] Technique based on geometry and machine learning for a small class of SyGuS problems. A newer version Alchemist_CSDT can learn specifications that demand ``point-wise functions f expressible in LIA

Combination of constraint solvers and decision-tree learning Alchemist_CSDT did well in the SyGuS competition. References T. M. Mitchell. Machine learning. McGraw Hill series in Computer Science. McGraw-Hill, 1997. Kearns, Vazirani: An Introduction to Computational Learning Theory. MIT Press, 1994 Pranav Garg, Christof Loeding, P. Madhusudan, and Daniel Neider. ICE: A Robust Learning Framework for Synthesizing Invariants, CAV 2014. Pranav Garg, Daniel Neider, P. Madhusudan, Dan Roth. Learning Invariants using Decision Trees and Implication Counterexamples, July 2015. J.R. Quinlan. Induction on Decision Trees, Machine Learning Vol 1, Issue 1, 1986. Nick Littlestone. Learning Quickly When Irrelevant Attributes Abound: A New Linear-threshold Algorithm, Machine Learning 285318(2). Dan Roth: CS446 Machine Learning, Course at UIUC: http://l2r.cs.illinois.edu/~danr/Teaching/CS446-14/schedule.html Christof Loding, P. Madhusudan, Daniel Neider,: Abstract Learning Frameworks for Synthesis. Technical Report. Conclusions Several machine learning algorithms are robust, or can be made robust, for use in verification problems. Learning Boolean formulas and with atomic formulas of the kind Conjunctions, linear threshold functions (SVMs, Winnow,

Perceptron), Decision trees (Bool + Numerical Attributes) Bridges: Understand algorithm and make sure they learn consistent hypotheses Adapt them to ICE Build in convergence, in the iterative model. ML Inductive synthesis emerging as a viable alternative to whitebox synthesis Inv Syn THANK YOU. QUESTIONS?