Intelligent Information Retrieval and Web Search

Learning Semantic Parsers: An Important But Under-Studied Problem Raymond J. Mooney Dept. of Computer Sciences University of Texas at Austin "The fish trap exists because of the fish. Once you've gotten the fish you can forget the trap. The rabbit snare exists because of the rabbit. Once you've gotten the rabbit, you can forget the snare. Words exist because of meaning. Once you've gotten the meaning, you can forget the words. Where can I find a man who has forgotten words so I can talk with him?" -- The Writings of Chuang Tzu, 4th century B.C. 1 Natural Language Learning Most computational research in naturallanguage learning has addressed low-level syntactic processing. Morphology (e.g. past-tense generation)

Part-of-speech tagging Shallow syntactic parsing Syntactic parsing 2 Semantic Language Learning Learning for semantic analysis has been restricted to relatively small, isolated tasks. Word sense disambiguation (e.g. SENSEVAL) Semantic role assignment (determining agent, patient, instrument, etc., e.g. FrameNet) Information extraction 3 Cognitive Modeling Most computational research on naturallanguage learning is focused on engineering performance on large corpora. Very little attention is paid to modeling human-language acquisition. 4

Language Learning Training Data Most computational language-learning systems assume detailed, supervised training data that is unavailable to children acquiring language. Sentences paired with phonetic transcriptions Words paired with their past tense. Sentences with a POS tag for each word Sentences paired with parse trees (treebank) Sentences with words tagged with proper senses Sentences with semantic roles tagged Documents tagged with named entities and semantic relations. 5

Semantic Parsing A semantic parser maps a natural-language sentence to a complete, detailed semantic representation (logical form). Semantic Parser Acquisition attempts to automatically induce such parsers from corpora of sentences each paired only with a semantic representation. Assuming a child can infer the likely meaning of an utterance from context, this is more cognitively plausible training data. 6 CHILL (Zelle & Mooney, 1992-96) Semantic parser acquisition system using Inductive Logic Programming (ILP) to induce a parser written in Prolog. Starts with a parsing shell written in Prolog and learns to control the operators of this parser to produce the given I/O pairs. Requires a semantic lexicon, which for each word gives one or more possible semantic

representations. Parser must disambiguate words, introduce proper semantic representations for each, and then put them together in the right way to produce a proper representation of the sentence. 7 CHILL Example U.S. Geographical database Sample training pair What is the capital of the state with the highest population? answer(C, (capital(S,C), largest(P, (state(S), population(S,P))))) Sample semantic lexicon what: answer(_,_) capital:

capital(_,_) state: state(_) highest: largest(_,_) population: population(_,_) 8 WOLFIE (Thompson & Mooney, 1995-1999) Learns a semantic lexicon for CHILL from the same corpus of semantically annotated sentences. Determines hypotheses for word meanings by finding largest isomorphic common subgraphs shared by meanings of sentences in which the word appears. Uses a greedy-covering style algorithm to learn a small lexicon sufficient to allow compositional construction of the correct representation from the words in a sentence. 9

WOLFIE + CHILL Semantic Parser Acquisition NLLF Training Exs WOLFIE Lexicon Learner Semantic Lexicon CHILL Parser Learner Natural Language Semantic Parser Logical Form 10

U.S. Geography Corpus Queries for database of about 800 facts on U.S. geography. Collected 250 sample questions from undergraduate students in a German class. Questions annotated with correct logical form in Prolog. Questions also translated into Spanish, Turkish, and Japanese. 11 Experimental Evaluation 10-fold cross-validation over sentences in the corpus. Generate learning curves by training on increasing fractions of the total training set. Test accuracy by determining if, when executed in Prolog, the generated logical query generates the same answer from the database as the correct logical query. Compared performance to manually-written NL interface called Geobase. Compared performance with manually-written vs.

learned semantic lexicon. 12 CHILL + WOLFIE Learning Curves 13 Cognitively Plausible Aspects of CHILL/WOLFIE Deterministic parsing that processes sentence one word at a time. Evidence from garden path sentences Dynamically integrates syntactic and semantic cues during parsing. Evidence from real-time studies of language comprehension (Tanenhaus et al.) Learns from plausible input of sentences paired only with semantic form. 14 Cognitively Implausible Aspects

of CHILL/WOLFIE Batch training on complete corpora. Incremental training that processes one sentence at a time (Siskind, 1996). Lexicon learning is disconnected from parser learning. Assumes each sentence is annotated with a single, correct semantic form. 15 Interactions between Syntax and Lexicon Acquisition Syntactic Bootstrapping allows children to use verb syntax to help acquire verb meanings (Gleitman, 1990) Big Bird and Elmo are gorping. Big Bird is gorping Elmo 16 Contextually Ambiguous

Sentence Meaning Sentences are uttered in complex situations composed of numerous potential meanings. Could assume each sentence is annotated with multiple possible meanings inferred from context (Siskind, 1996). Multiple instance learning (Dietterich et al., 1997) Assuming context meaning is represented as a semantic network, sentence meaning could be assumed to be any connected subgraph of the context. 17 Sample Ambiguous Context Juvenile caresses canine. Dog isa Chewing HasColor attr

obj Black Spot patient Petting agent agent patient Thing1 isa Mary isa Child HasColor attr obj Blonde Possess patient agent isa

Thing2 Barbie obj part isa HasPart Doll Hair Bone 18 Sample Ambiguous Context Juvenile caresses canine. Dog isa Chewing HasColor attr obj

Black Spot patient Petting agent agent patient Thing1 isa Mary isa Child HasColor attr obj Blonde Possess patient agent isa Thing2

Barbie obj part isa HasPart Doll Hair Bone 19 Sample Ambiguous Context Juvenile caresses canine. Dog isa Chewing HasColor attr obj Black

Spot patient Petting agent agent patient Thing1 isa Mary isa Child HasColor attr obj Blonde Possess patient agent isa Thing2 Barbie

obj part isa HasPart Doll Hair Bone 20 Sample Ambiguous Context Juvenile caresses canine. Dog isa Chewing HasColor attr obj Black

Spot patient Petting agent agent patient Thing1 isa Mary isa Child HasColor attr obj Blonde Possess patient agent isa Thing2 Barbie obj

part isa HasPart Doll Hair Bone 21 Sample Ambiguous Context Juvenile caresses canine. Dog isa Chewing HasColor attr obj Black Spot patient Petting

agent agent patient Thing1 isa Mary isa Child HasColor attr obj Blonde Possess patient agent isa Thing2 Barbie obj part

isa HasPart Doll Hair Bone 22 Sample Ambiguous Context Juvenile caresses canine. Dog isa Chewing HasColor attr obj Black Spot patient Petting agent

agent patient Thing1 isa Mary isa Child HasColor attr obj Blonde Possess patient agent isa Thing2 Barbie obj part isa

HasPart Doll Hair Bone 23 Sample Ambiguous Context Juvenile caresses canine. Dog isa Chewing HasColor attr obj Black Spot patient Petting agent agent

patient Thing1 isa Mary isa Child HasColor attr obj Blonde Possess patient agent isa Thing2 Barbie obj part isa HasPart

Doll Hair Bone 24 Sample Ambiguous Context Juvenile caresses canine. Dog isa Chewing HasColor attr obj Black Spot patient Petting agent agent patient

Thing1 isa Mary isa Child HasColor attr obj Blonde Possess patient agent isa Thing2 Barbie obj part isa HasPart Doll

Hair Bone 25 Sample Ambiguous Context Juvenile caresses canine. Dog isa Chewing HasColor attr obj Black Spot patient Petting agent agent patient Thing1

isa Mary isa Child HasColor attr obj Blonde Possess patient agent isa Thing2 Barbie obj part isa HasPart Doll

Hair Bone 26 Issues in Engineering Motivation Most computational language-learning research strives for broad coverage while sacrificing depth. Scaling up by dumbing down Realistic semantic parsing currently entails domain dependence. Domain-dependent natural-language interfaces have a large potential market. Learning makes developing specific applications more tractable. Training corpora can be easily developed by tagging existing corpora of formal statements with natural-language glosses. 27 Robocup Coach Competition Simulated robot soccer competition.

Coachable teams can take advice on how to play the game. Coaching instructions are provided in a formal language called CLANG. 28 Project on Interactive Learning from Language Advice and Reinforcements Broadening the Communication Channel Between Machine Learners and their Human Teachers Raymond J. Mooney, Jude Shavlik Rich Maclin, Peter Stone, Risto Miikkulainen DARPA Machine Learning Seedling Project 29 CLANG Corpus We collected 500 examples of CLANG statements written by humans for the Robocup Coach Competition. Each statement was annotated with a

synonymous English sentence. 30 Samples from English/CLANG Corpus If player 4 has the ball, it should pass the ball to player 2 or 10. ((bowner our {4}) (do our {4} (pass {2 10})))) No one pass to the goalie. ((bowner our {0}) (dont our {0} (pass {1})))) If the ball is in the left upper half of the field, and player 2 is within distance 5 of the ball, it should intercept the ball. ((and (bpos (rec (pt -52.50 -34.00) (pt 0.00 34.00))) (ppos our { 2 } 1 11 (arc (pt ball) 0.00 5.00 0.00 360.00))) (do our { 2 } (intercept)))) If players 9, 10 or 11 have the ball, they should shoot and should not pass to players 2-8. ((bowner our {9 10 11}) (do our {9 10 11} (shoot)) (dont our {9 10 11} (pass {2 3 4 5 6 7 8})))) 31 New Approaches to Semantic Parsing Directly mapping NL sentences to logical

form using string-to-tree transduction rules. Mapping NL syntactic parse trees to logical form using tree-to-tree transduction rules. Integrated syntactic/semantic parsing using syntactic and semantic knowledge-bases. 32 String-to-Tree Transduction This approach exploits the CLANG grammar but not an English grammar. Sample production rules of CLANG grammar: ACTION CONDITION CONDITION (pass UNUM_SET) (bowner our UNUM_SET) (play_m PLAY_MODE) Each CLANG Statement can be unambiguously parsed ((bowner our{2}) (do our {2} (pass {10}))) RULE

CONDITION bowner our UNUM_SET 2 DIRECTIVE do our UNUM_SET 2 ACTION pass UNUM_SET 10 33 Transduction Rules

Rules convert sub-strings of natural language sentences into CLANG production instances in the resulting parse tree. player N has ball => CONDITION the ball is with player N => CONDITION (bowner our {N}) (bowner our {N}) Replace the matched sub-strings by the non-terminals and apply rules for higher level CLANG production rules If CONDITION then DIRECTIVE => RULE (CONDITION DIRECTIVE) 34 Example Sentence: If player 2 has the ball, player 2 should pass the ball to player 10. CLANG representation: ((bowner our{2}) (do our {2} (pass {10}))) Rule: player N has [1] ball => CONDITION (bowner our {N})

CONDITION 2 has the ball, player 2 should pass the ball to player 10. If player (bowner our{2}) 35 Example contd. Sentence: If CONDITION, player 2 should pass the ball to player 10. Rule: pass [1] ball to player N => ACTION (pass {N}) ball to player 10 . If CONDITION , player 2 should pass theACTION (bowner our{2}) (pass {10}) 36 Example contd.

Sentence: If CONDITION, player 2 should ACTION. Rule: player N should ACTION => DIRECTIVE (do our {N} ACTION) 2 should ACTION If CONDITION , playerDIRECTIVE . (bowner our{2}) (do our {2} (pass (pass {10}) {10})) 37 Example contd. Sentence: If CONDITION, DIRECTIVE. Rule: If CONDITION [1] DIRECTIVE . => RULE (CONDITION DIRECTIVE) If

CONDITIONRULE DIRECTIVE . . , (bowner our{2}) (pass {10})) ((bowner our {2}) (do(do ourour {2}{2} (pass {10}))) 38 Experiment with Manually-built Rules To test the feasibility of this approach, rules were manually written to cover 40 examples. When tested on previously unseen 63 examples, the rules covered 18 examples completely and the remaining examples partially. Good indication that the approach can work if

manually-built rules can be automatically learned. 39 Learning Transduction Rules Parse all the CLANG examples For every production rule in the CLANG grammar: Call those sentences positives whose CLANG representations parses has that production rule Call the remaining sentences negatives Learn rules using Information Extraction system ELCS (Extraction using Longest Common Subsequences) which we developed for extracting protein-protein interactions from biological text Given examples of positive and negative sentences, ELCS repeatedly generalizes positive sentences to form rules until the rules become overly general and start matching negative examples 40 Generalization Method: Longest Common Subsequence Whenever the ball is in REGION player 6 should be positioned at REGION . If the ball is in the near quarter of the field , player 2 should position itself at

REGION . the [0] ball [0] is [0] in [7] player [1] should [2] at [0] REGION 41 Example of Learned Rules CONDITION (bpos REGION) positives The ball is in REGION , our player 7 is in REGION and no opponent is

around our player 7 within 1.5 distance. If the ball is in REGION and not in REGION then player 3 should intercept the ball. During normal play if the ball is in the REGION then player 7 , 9 and 11 should dribble the ball to the REGION . If our team has the ball and the ball is in REGION , then it should be passed to REGION . If the ball is in REGION then position player 7 at REGION . When the play mode is normal and the ball is in the REGION then our player 2 should pass the ball to the REGION . All players except the goalie should pass the ball to RP12 if it is in RP18. If the ball is inside rectangle ( -54 , -36 , 0 , 36 ) then player 10 should position itself at REGION with a ball attraction of REGION . Player 2 should pass the ball to REGION if it is in REGION . negatives

If our player 6 has the ball then he should take a shot on goal. If player 4 has the ball , it should pass the ball to player 2 or 10. If the condition DR5C3 is true , then player 2 , 3 , 7 and 8 should pass the ball to player 3. If "DR6C11" , player 10 , 3 , 4 , or 5 should the ball to player 11. if DR1C7 then players 10 , 3 , 4 and 5 should pass to player 5. During play on , if players 6 , 7 or 8 is in REGION , they should pass the ball to players 9 , 10 or 11. If "Clear_Condition" , players 2 , 3 , 7 or 5 should clear the ball REGION . If it is before the kick off , after our goal or after the opponent's goal , position player 3 at REGION . If the condition MDR4C9 is met , then players 4-6 should pass the ball to player 9. If Pass_11 then player 11 should pass to player 9 and no one else. ELCS ball [0] is [2] REGION

ball [0] is [2] =>REGION CONDITION (bpos REGION) the [0] ball [0] in [0] the REGION [0] ball => [0] in [0] REGION (bpos REGION) CONDITION 42 Resources Required for Progress in Semantic Parser Acquisition More corpora of sentences annotated with logical form. More researchers studying the problem. More algorithms for addressing the problem. More ideas, constraints, and methods from psycholinguistics that can be exploited. More psycholinguistic issues that can be explored. 43

Recently Viewed Presentations

  • Case Presentation - SFU.ca

    Case Presentation - SFU.ca

    BP officials rejected a safety proposal to install casing liner and casing stabilizers which cost $7 million and 10+ hour of labour. In an email BP engineer Brian Morel called Deepwater a "nightmare rig" due to these cavalier safety risks....
  • iblog.dearbornschools.org

    iblog.dearbornschools.org

    Burris Ewell replies to Miss Caroline's suggestion that he go home and bathe: "You ain't sendin' me home, missus. I was on the verge of leavin'-I done my time for this year" (Lee 35). Safely out of range, he turned...
  • Thermodynamic Property Methods

    Thermodynamic Property Methods

    Two Phase PipelineExample. Ref.: Brill & Beggs, Two Phase Flow in Pipes, 6th Edition, 1991. Chapter 3.
  • Capacity Building Programme New Techniques & Strategies in

    Capacity Building Programme New Techniques & Strategies in

    Petroleum Act & Rules 2002 wrt POL Storage Installations . Petroleum Act 1934 Amended 2002 under revision 2011. Class-Aupto30 lits. No License. Up to 100 lits class-A can be stored if intended to be used to generate power for motor...
  • Theme - Mrs. Gomes

    Theme - Mrs. Gomes

    T.A.G. (include the title, author, and genre) Statement of the human condition . What is the author trying to say about humanity? Present tense. Talk about literature as if it is going on right now. Third-person only!
  • Multimedia - web.cs.wpi.edu

    Multimedia - web.cs.wpi.edu

    Multimedia CS-3013 & CS-502 Operating Systems
  • Implementation of Marine Corps Spiritual Fitness Strategy within

    Implementation of Marine Corps Spiritual Fitness Strategy within

    Spiritual Fitness ALMAR. R 031813Z Oct 16. ALMAR 033/16. MSGID/GENADMIN/CMC WASHINGTON DC DMCS// SUBJ/SPIRITUAL FITNESS// GENTEXT/REMARKS/1. Fitness is a vital part of being a United States Marine.
  • Humanism: Renaissance Philosophy

    Humanism: Renaissance Philosophy

    1) What civilizations influenced Renaissance culture?. A. Greek and Roman C. Indian and Chinese. B. Roman and Chinese D. Greek and Indian. 2) Which of the following is true of Humanists? A. They believe that humans are naturally evil and...