Propositional Approaches to First-Order Theorem Proving David A. Plaisted UNC Chapel Hill May 2004 History of AI Early emphasis on general methods Newell Shaw Simon GPS

Robinson 1965 resolution Cordell Green question answering Shift to specialized techniques Feigenbaum Expert Systems Is logic a suitable basis for AI? 02/06/20 Approaches to AI

Weak vs. strong methods in AI Declarative vs. procedural knowledge My interest: general logic-based approaches 02/06/20 Aristotle on Deduction A deduction is speech (logos) in which, certain things having been supposed, something different from those supposed results of necessity because of their being so. (Prior Analytics I.2, 24b18-20) Proof

Proof is the idol before whom the pure mathematician tortures himself. -- Sir Arthur Eddington You may prove anything by figures. --Thomas Carlyle What is now proved was once only imagined. -- William Blake Proof You cannot demonstrate an emotion or prove an aspiration. -- John Morley Prove all things; hold fast

that which is good. -Bible, I Thessalonians Logic No, no, you're not thinking; you're just being logical. -Niels Bohr Logic is one thing and commonsense another. -Elbert Hubbard, The Note Book, 1927 Theorem Proving Potentially a key technology for AI

An unsolved problem Weak versus strong methods Problems with resolution Brittleness problem for expert systems Impact on entire field Importance of space versus time Theorem Proving on a Computer

Speed and accuracy of computers People get tired and make mistakes How do people prove theorems? Potential applications Hardware verification Software verification AI and expert systems Robots Deductive Databases Semantic web and query answering

Mathematics research Education Current theorem provers Largely syntactic Resolution or ME (tableau) based First-order provers are often poor on non-Horn clauses Rarely can solve hard problems Human interaction needed for hard problems 02/06/20

How do humans prove theorems? Semantics Case analysis Sequential search through space of possible structures Focus on the theorem People versus computers In a few areas computers are faster Propositional calculus Equational logic Geometry

More to come in the future In general people are much better. Why? Humans use semantics Computers use syntax in most cases The future Will provers soon be much more powerful than they are now? Will they ever be much more powerful than humans?

Organization of the talk History of ATP Contributions of Martin Davis Contributions of Alan Robinson Achievements of Provers Propositional Calculus Propositional Resolution Horn Clauses Davis and Putnams Method The Satisfiability Threshold Propositional Calculus (continued)

Performance Obtained Applications Semantics in Theorem Proving First Order Logic Clause form and Herbrands theorem Criteria for evaluating provers Resolution Otter

Model elimination Matings Propositional approaches to first order logic Clause Linking Disconnection Calculus Disconnection Calculus Theorem Prover First-Order DPLL Method Replacement Rules

Definitions OSHL with semantics Comments on CADE system competition David Hilbert Hilberts goal was to mechanize mathematics. Hilberts Program. Goedel showed that this is impossible. Automatic theorem proving tries to

mechanize what can be mechanized. Martin Davis Theorem Proving on Computers Davis and Putnams Method Clause Form Refutational Theorem Proving Foreshadowing of Resolution Alan Robinson Resolution in First-Order Logic Unification in a Clause Form

Refutational Prover Many non-resolution methods are still in this tradition First reasonably powerful theorem prover for first-order logic Achievements of Provers

Robbins Problem Solution Proof of Cantors Theorem Hardware Verification Prolog Constraints Quasigroup existence and nonexistence Equivalential calculus axiom systems Euclidean and non-Euclidean geometry Achievements of Provers Verification of communication networks

Basketball scheduling Planning RRTP and description logic Propositional Calculus Formulae are composed of Boolean variables p,q,r, and Boolean connectives: (conjunction, and) (disjunction, or) (negation, not) (implication, if then) (equivalence, if and only if)

Example formula Interpretation: It is raining and It is Tuesday implies It is raining. Another interpretation:

pqp All birds are green and All fish are purple implies All birds are green. Both interpretations make the formula true. The formula is valid (true in all interps.) Another example formula: Interpretation:

2=2 3=3 2 2 Another interpretation: pqp 2=2 3 3 2 2 The first interpretation makes the formula false. The second makes it true. The formula is not valid.

Truth Tables 1st Conjunct A true true false false Truth Table for Conjunctions 2nd Conjunct B true false true false Statement A B true false

false false 1st Disjunct A true true false false Truth Table for Disjunctions 2nd Disjunct B true false true false Statement A B true

true true false Antacedent A true true false false Truth Table for Conditionals Consequent B true false true false Statement A B

true false true true Antacedent A true true false false Truth Table for Equivalences Consequent B true false true false Statement

A B true false false true Truth Table for Negations Statement A true false Negation A false true

Interpretations assign meanings to symbols. In Boolean logic interpretations assign truth values (true, false) to the symbols. An interpretation in Boolean logic is called a valuation. Thus a valuation I is an assignment of truth values (true or false) to each variable in a formula A valid formula P T T F F

Q T F T F P Q T F F F P Q P T T T T A satisfiable invalid formula P T

T F F Q T F T F P Q T F F F (P Q) Q T F T F

An unsatisfiable formula: P P P P P P T F F F T F

Testing Validity Using truth tables is exponential Resolution Davis and Putnams Method Local Search Methods Conjunctive Normal Form Any propositional formula can be put into conjunctive normal form

(clause form). Example: (p q r) (p r) (q r) Represent as sets: {p, q, r}, {p, r}, {q, r} clause clauseclause Conjunctive Normal Form A formula in conjunctive normal form is unsatisfiable if for every

interpretation I, there is a clause C that is false in I. A formula in cnf is satisfiable if there is an interpretation I that makes all clauses true. Binary Resolution Step For any two clauses C1 and C2, if there is a literal L1 in C1 that is complementary to a literal L2 in C2, then delete L1 and L2 from C1 and C2 respectively, and construct the disjunction of the remaining clauses. The constructed clause is a resolvent of C1 and C2.

Examples of Resolution Step C1=a b, C2=b c Complementary literals : b,b Resolvent: ac C1=a bc, C2=b d Complementary literals : b, b Resolvent : a c d Resolution in Propositional Logic 1. 2.

3. 4. 5. a b c b cde ef df a b c b c d e ef d f Resolution in Propositional Logic (continued)

First, the goal to be a a b c proved, a , is negated and added to the b c b clause set. The derivation of c indicates that the e f d e database of clauses d f d is inconsistent. f f

c d Horn clauses At most one positive literal Basis of Prolog Satisfiability can be tested in linear time Resolution is fast for Horn clauses Resolution is very slow for non Horn

clauses Horn clauses: p q r, p q r, r Non Horn clause: p q r Hard problems are usually non-Horn 1. 2. 3. 4. 5. 6. 7. DPLL (Davis and Putnams Method) (Purity rule omitted) If no clauses in KB, return T (Satisfiable) If a clause in KB is empty (FALSE), return F

(Unsatisfiable) If KB has a unit clause C with prop. p, then return DPLL(KB,ppolarity(p,C)) Choose an uninstantiated variable p If DPLL(KB, pTRUE) returns T, return T If DPLL(KB, pFALSE) returns T, return T Return F DPLL Example {p,r},{p,q,r}, {p,r} p=T p= F {T,r},{T,q,r}, {T,r} {F,r},{F,q,r}, {F,r}

SIMPLIF Y {q,r} SIMPLI FY {r}, {r} SIMPLI {} FY DPLL Viewed Abstractly

The call DPLL(KB, pTRUE) is testing interpretations where p is TRUE The call DPLL(KB, pFALSE) is testing interpretations where p is FALSE In this way, interpretations are examined in a sequential manner For each interpretation, a reason is found that the formula is false in it Such a sequential search of interpretations is very fast DPLL (Davis and Putnams method), contiued

DPLL does a backtracking search for a model of the formula DPLL is much faster than propositional resolution for nonHorn clauses Very fast data structures developed Popular for hardware verification Local search can be much faster but is incomplete Systematic methods can now routinely solve verification problems with thousands or tens of thousands of variables, while

local search methods can solve hard random 3SAT problems with millions of variables. (from a conference announcement) NP Complete but Easy How can the satisfiability problem be so easy when it is NP complete? If there are many clauses the proof is likely to be short and can be found quickly If there are few clauses there are

likely to be many interpretations and one is likely to be found quickly The hard problems are in the middle at the satisfiability threshold First Order Logic Formulae may contain Boolean connectives and also variables x, y, z, , predicates P,Q,R, , function symbols f,g,h, , and quantifiers and meaning for all and there exists. Example: x(P(x) yQ(f(x),y)) Individual Constants

Formulae can also contain constant symbols like a,b,c which can be regarded as functions of no arguments. Example: x(P(x) Q(x,c)) Consider the formula yxP(x,y) xyP(x,y). Let the domain be the set of people, and let P(x,y) be x loves y. The formula then is interpreted as if there exists y such that for all x, x loves y, then for all x, there exists y

such that x loves y. In other words, if there is someone that everyone loves, then everyone loves someone. The formula is true under this interpretation. In fact this formula is true under all interpretations, and is a valid formula. Consider this formula: xyP(x,y) yxP(x,y). Under the same interpretation, this formula becomes If for all x, there exists y such that x loves y, then there exists y such that for all x, x

loves y. In other words, if everyone loves someone, then there is someone that everyone loves. This formula is false under this interpretation and is not a valid formula. Clauses An atom is a predicate symbol followed by arguments, as, P(a, f(x)). A literal is an atom or its negation, as,

P(a,f(x)). A clause is a disjunction of literals, often written as a set. Example: {p(x), p(f(x))} for p(x) p(f(x)) A conjunction of clauses is also written as a set, as, {C1, C2, C3} signifying C1 C2 C3. Substitutions A substitution is an assignment of terms to variables. If C is a clause then C is C with the substitution applied uniformly.

Thus {P(x)}{x f(a)} is {P(f(a))}. C is called an instance of C. If C has no variables, it is called a ground instance of C. Semantics Gelernter 1959 Geometry Theorem Prover Adapt semantics to clause form: An interpretation (semantics) I is an assignment of truth values to literals so that I assigns opposite truth values to L and L for atoms L. The literals L and L are said to be

complementary. Semantics We write I C (I satisfies C) to indicate that semantics I makes the clause C true. If C is a ground clause then I satisfies C if I satisfies at least one of its literals. Otherwise I satisfies C if I satisfies all ground instances D of C. (Herbrand interpretations.) If I does not satisfy C then we say I

falsifies C. Example Semantics Specify I by interpreting symbols Interpret predicate p(x,y) as x = y Interpret function f(x,y) as x + y Interpret a as 1, b as 2, c as 3 Then p(f(a,b),c) interprets to TRUE but p(a,b) interprets to FALSE

Thus I satisfies p(f(a,b),c) but I falsifies p(a,b) Obtaining Semantics Humans using mathematical knowledge Automatic methods (finite models) Trivial semantics Herbrands Theorem

A set S of clauses is unsatisfiable if there is a finite unsatisfiable set T of ground instances of S. The basis of uniform proof procedures. Example: S = {{p(a)},{p(x), p(f(x))}, {p(f(f(a)))}} T = {{p(a)},{p(a), p(f(a))}, {p(f(a)), p(f(f(a)))}, {p(f(f(a)))}} {p(a)} {p(x), p(f(x))} {p(f(f(a)))} {p(a)} {p(a), p(f(a))} {p(f(a)), p(f(f(a)))}

{p(f(f(a)))} Criteria to evaluate provers Dont know versus dont care nondeterminism Clauses generated by need or possibility Instantiation by unification or by semantics or neither Clauses selected by semantics

Goal sensitivity Space versus time Resolution Principle Steps for resolution refutation proofs Put the premises or axioms into clause form. Add the negation of what is to be proved, in clause form, to the set of axioms. Resolve these clauses together, producing new clauses that logically follow from them. Produce a contradiction by generating the empty clause. This is possible if and only if the theorem is valid. (Completeness)

Prove that Fido will die. from the statements Fido is a dog., All dogs are animals. and All animals will die. Changing premises to predicates (x) (dog(X) animal(X)) dog(fido) Modus Ponens and {fido/X} animal(fido) (Y) (animal(Y) die(Y))

Modus Ponens and {fido/Y} die(fido) Equivalent Reasoning by Resolution Convert predicates to clause form Predicate form Clause form 1.(x) (dog(X) animal(X)) dog(X) animal(X) 2.dog(fido) dog(fido)

3.(Y) (animal(Y) die(Y)) animal(Y) die(Y) Negate the conclusion 4.die(fido) die(fido) Equivalent Reasoning by Resolution(continued) dog(X) animal(X) animal(Y) die(Y) {Y/X} dog(fido)

dog(Y) die(Y) {fido/Y} die(fido) die(fido) Resolution proof for the dead dog problem Skolemization Skolem constant (X)(dog(X)) may be replaced by dog(fido)

where the name fido is picked from the domain of definition of X to represent that individual X. Skolem function If the predicate has more than one argument and the existentially quantified variable is within the scope of universally quantified variables, the existential variable must be a function of those other variables. (X)(Y)(mother(X,Y)) (X)mother(X,m(X)) (X)(Y)(Z)()(W)(foo (X,Y,Z)(,W)) (X)(Y)(W)(foo(X,Y,f(X,Y),W)) Resolution on the predicate calculus A literal and its negation in parent clauses produce a resolvent only if they

unify under some substitution . is then applied to the resolvent before adding it to the clause set. C1 = dog(X) animal(X) C2 = animal(Y) die(Y) Resolvent : dog(Y) die(Y) {Y/X} C1 = p(X) q(f(X)) C2 = q(Y) r(g(Y)) Resolvent: p(X) r(g(f(X))) Lucky student 1. Anyone passing his history exams and winning the lottery is happy X(pass(X,history) win(X,lottery) happy(X))

2. Anyone who studies or is lucky can pass all his exams. XY(study(X) lucky(X) pass(X,Y)) 3. John did not study but he is lucky study(john) lucky(john) 4. Anyone who is lucky wins the lottery. X(lucky(X) win(X,lottery)) Clause forms of Lucky student 1. pass(X,history) win(X,lottery)

happy(X) 2. study(X) pass(Y,Z)() lucky(W) pass(W,V) 3. study(john) lucky(john) 4. lucky(V) win(V,lottery) 5. Negate the conclusion John is happy happy(john) Resolution refutation for the Lucky Student problem pass(X, history) win(X,lottery) happy(X) win(U,lottery) lucky(U) {U/X} pass(U, history) happy(U) lucky(U)

lucky(john) happy(john) {john/U} pass(john,history) lucky(join) {} pass(john,history) lucky(V) pass(V,W) {john/V,history/W} lucky(john) {} lucky(john) Evaluating resolution

Clauses generated by possibility (bad) Dont care nondeterminism (good) Unification based (good?) No semantics (bad) Uses a large amount of space (bad) Often not goal sensitive (bad) Refinements

Many refinements of resolution have been developed in an attempt to improve its performance Set of support Hyper resolution Ancestry filter form Unit preference Otter

PROBLEM LCL064-1.in LCL064-2.in LCL065-1.in LCL066-1.in LCL067-1.in LCL068-1.in LCL069-1.in LCL070-1.in LCL071-1.in LCL072-1.in SEC 0.14 0.00 0.00 0.00 0.14

0.29 0.00 0.14 0.29 0.00 CLAUSES 1080844 9448 2992 1452 492984 569577 3577 427166 449389 161139 KEPT 8604 1954

653 306 9283 9593 288 8840 8941 6280 Hyper Linking Separates instantiation and inference Given S, selects clauses C and D in S and literals L in C and M in D, and generates instances C and D so

that L and M are complementary. Then C and D are added to S. Periodically S is tested for unsatisfiability using DPLL. Hyper Linking Problem Ph5 Ph9 Latinsq Salt Zebra Input Clauses 45 297 16 44 128

OTTER (sec) 38606.76 >24 hrs >24 hrs 1523.82 >24 hrs Hyper Linking 1.8 2266.6 56.4 28.0 866.2 Eliminating Duplication with the Hyper-Linking Strategy, Shie-Jue Lee and David A. Plaisted,

Journal of Automated Reasoning 9 (1992) 25-42. Later propositional strategies Billons disconnection calculus, derived from hyper-linking Disconnection calculus theorem prover (DCTP), derived from Billons work FDPLL Performance of DCTP on TPTP, 2003

First in EPS and EPR (largely propositional) Third in FNE (first-order, no equality) solving same number as best provers Fourth in FOF and FEQ (all firstorder formulae, and formulae with equality) Not tuned to 50 categories! Definition Detection Problem P1 P2 P3

P4 P5 OSHL Time 0.3 2.3 11.25 1.35 2.0 Otter Time 0.03 1000+ 1000+ 1000+ 1000+ Otter Clauses

51 41867 27656 105244 54660 Replacement Rules with Definition Detection, David A. Plaisted and Yunshan Z)(hu, in Caferra and Salzer, eds., Automated Deduction in Classical and Non-Classical Logics, LNAI 1761 (1998) 80-94. Structure of OSHL Goal sensitivity if semantics chosen properly

Use of natural semantics Choose initial semantics to satisfy axioms For group theory problems, can specify a group Sequential search through possible interpretations Thus similar to Davis and Putnams method Propositional Efficiency

Constructs a semantic tree Ordered Semantic Hyperlinking (Oshl) Reduce first-order logic problem to propositional problem Imports propositional efficiency into firstorder logic The algorithm Imposes an ordering on clauses Progresses by generating instances and refining interpretations I0

D0 I1 I2 D1 unsatisfiable I3 D2 T OSHL

I0 is specified by the user Di is chosen so that Ii falsifies Di Di is an instance of a clause in S Ii is chosen so that Ii satisfies Dj for all j < i Let Ti be {D0,D1, , Di-1}. Ii falsifies Di but satisfies Ti When Ti is unsatisfiable OSHL stops and reports that S is unsatisfiable. Rules of OSHL (C1,C2, , Cn), D minimal contradict I (C1,C2, , Cn,D) (C1,C2, , Cn), Cn not needed (C1,C2, , Cn-1,D) (C1,C2, , Cn,D), max resolution possible

(C1,C2, , Cn-1,res(Cn,D,L)) Example () ({-p1,-p2,-p3}) ({-p1,-p2,-p3},{-p4,-p5,-p6}) ({},{},{-p7}) ({},{},{-p7},{p3,p7}) ({},{-p4,-p5,-p6},{p3}) ({-p1,-p2,-p3},{p3}) ({-p1,-p2}) Number of Clauses Generated

Problem GRP005-1 GRP006-1 GRO007-1 GRP018-1 GRP019-1 GRP020-1 GRP021-1 GRP023-1 GRP032-3 GRP034-3

GRP034-4 GRP042-2 GRP043-2 GRP136-1 GRP137-1 #clauses, Otter 57 62 85 266 267 265 264 79 83 141 222 21 80 0

0 Oshl+semantics 3 7 22 16 15 18 19 22 14 30 6 15 81 8 8 Engineering Issue

OSHL generates about 10 clauses per second Otter generates more than a million clauses per second A factor of 100,000 in engineering! Need to look at search space sizes rather than times Evaluating OSHL

Clauses generated by need (good) Dont care nondeterminism (good) Instantiates using semantics (good) Goal sensitive (good) Space efficient (good) No unification (bad?) Need for more engineering TPTP library by Geoff Sutcliffe & Christian Suttner Thousands of problems for theorem provers Used to benchmark first order theorem provers Contains 6973 theorems at present

CASC competition by Sutcliffe et al. Every year: who has the fastest/most accurate first order theorem prover on the planet? Uses blind test from the TPTP library Current chamption: Vampire By Voronkov and Riazonov in Manchester CADE System Competition

The issue of 50 categories The 300 seconds issue Summary Efficiency of DPLL First-Order Theorem Proving Resolution Propositional Approaches Clause Linking DCTP and the CADE Competition Semantics OSHL