Methods for Open Information Extraction and Sense Disambiguation

Methods for Open Information Extraction and Sense Disambiguation

Methods for Open Information Extraction and Sense Disambiguation on Natural Language Text Luciano Del Corro January 11, 2016 Making machines read Dante completed the Divine Comedy in 1320 and died one year later in the city of Ravenna 1320.year finished 1321.year diedIn city.location 2/30 Why Automatic Text Understanding? 3/30 Related Work Syntactic

Processing WordSense Credibility WordEntry Induction Assessment Recognitionand Named Entity Event Disambiguation Extraction Disambiguation RuleMining KB Maintenance KBConstruction Text Ontology OpenInformation Named Extraction Entity Typing Applications Relation Extraction

Taxonomy Induction SemanticRole Labeling Discourse Parsing Privacy Semantic Parsing Keyword Text Summarization Semantic Sentiment Machine Question Dialogue Search Categorization Search Analysis Translation Answering Systems Knowledge + Reasoning = Applications 4/30 Semantic Layers: an incremental approach Input sentence OIE Word Sense Disambiguation

Dante completed the Divine Comedy in 1320 and died one year later in the city of Ravenna. Clau sI (Dante, completed, the Divine Comedy, in 1320) (Dante, died, in Ravenna) E (Dante, completed, the Divine Comedy, in 1320) (Dante, died, in Ravenna) Werd y come or bring to a finish or an end stop living (person, animal, or plant) Named Entity Typing (Dante, completed, the Divine Comedy, in 1320) (Dante, died, in Ravenna) Named Entity Disambiguation (Dante Alighieri, completed, Divine Comedy, in 1320) (Dante Alighieri, died, in Ravenna)

FINE T year epic poem: a long narrative poem person: a human being about a heros deeds. city: a large and densely poet: a writer of poems populated urban area An Italian epic poem by Dante A major Italian poet of the middle ages Relation Extraction finished(Dante Alighieri, Divine Comedy , 1320) the completion of a piece of work by a person in a certain date A city in Italy diedIn(Dante Alighieri, Ravenna) the death of a person in a certain location KB 5/30 Outline

1. ClausIE 2. Werdy 3. FINET 4. Conclusion 6/30 Open Information Extraction: From sentences to propositions Goal: Extract propositions from natural language text. Bell, a telecommunication company, which is based in Los Angeles, makes and distributes electronic, computer and building products. Subject Relation Argument Bell is a telecommunication company Bell is based in Los Angeles

Bell makes electronic products Bell distributes electronic products Most of OIE extractors Use supervised method or hand-craft heuristics Limited to triples and verb-mediated relations Not flexible representation (Gandhi, was, vegetarian) or (Gandhi, was, a vegetarian) 7/30 ClausIE overview: A two step approach 1. Detect Information:

Based on principles of the English language Universal (i.e. domain and application independent) Avoid filtering information Verb and non-verb based propositions 2. Represent Information Application dependent Triples or n-ary propositions? What is the form of the relation? (is based, is based in) What is the scope of the arguments? (e.g. NEs, NP, ADJ) ClausIE separates recognition and representation 8/30 A clause, a proposition ClausIE extracts a proposition from each clause A clause is like a simple sentence A sentence can be composed by more than one clause A clause can have optional adverbials A minimal clause is a clause without optional adverbials There are only 7 possible clause types without optional adverbials andgood it distributes

Bell is based in Los Angeles Angeles for reasonsproducts ClausIE collects all verb based propositions (and some non verb-based) and identifies its essential and optional information 9/30 The seven clauses 1. SVi Albert Einstein died. 2. SVcA Albert Einstein remained in Princeton. 3. SVcC Albert Einstein is smart. 4. SVtO AE has won the Nobel Prize. 5. SVdtOO RSAS gave AE the Nobel Prize.

6. SVctOA The doorman showed AE to his office. 7. SVctOC AE declared the meeting open. Subject Verb Object Adverbial Complement 10/30 From clauses to clause types to propositions root nn Albert prep nsubj

pobj (S: AE, V: died) (S: AE, V: died, A: in Princeton) Einstein died in Princeton . Copular (SVC) Intransitive (SV) yes Complement? Candidate adverbial? Conservative? yes no no no

yes no Known non-ext.no Copular yes yes Complex transitive (SVOA) Known ext. Copular? no DP clause Object? Extended Copular (SVA) yes yes

Dir .and ind. Object? yes Ditransitive (SVOO) no no Complement yes Complex transitive (SVOC) Cnad. adv. and direct object? yes Potentially compl-trans.? yes no Conservative? no no Monotransitive

(SVO) 11/30 Results: Reverb dataset 12/30 Outline 1. ClausIE 2. Werdy 3. FINET 4. Conclusion 13/30 Word Sense Disambiguation Goal: Assign dictionary senses to words in text Messi did not play three games last season because he was injured. 1. Recognize entry: play Verb : participate in games or sport focu sed : perform music on (a musical instrument) : contend against an opponent in a sport, game, or battle 2. Syntactic and semantic pruning

: participate in games or sport 2. Select Select sense: sense: : : participate participate in in games games or or sport sport 3. Context: game, season, injured 14/30 Entry Recognition Challenges A dictionary entry can be multi-word [take a breath] In which each word may be an entry [take and breath] An entry can appear discontinuously [take a deep breath] A word can have multiple senses Principle: A dictionary entry contains its own syntactic structure Methodology: Match dp subtrees from the sentence in the dictionary She

took his She took his hand take Include & continue hand and a deep breath She took a deep breath Include & continue take continue take breath take hand stop took: take, take a breath

Include & continue take a breath stop take (a) deep breath 15/30 Syntactic Pruning Principle: A verb sense occurs in a limited number of clause types Methodology: Prune senses that cannot not occur in the given clause He must attend to this matter () attend 1. be present at (meetings, church services, university), etc. "She attends class regularly 2. take charge of or deal with She must attend to all the details 3. to accompany as a circumstance or follow as a result She was attended by an ovation 4. work for or be a servant to She attends the old lady in the wheelchair 5. give heed (to)

She attended the recital SVA WordNet Frames (clause type) Somebody -s something (SVO) Somebody -s PP (SVA) Something -s something (SVO) Somebody -s somebody (SVO) Somebody s (SV) 16/30 Semantic Pruning Principle: A verb sense determines the semantic type of the argument. Methodology: Prune senses whose argument type does not match He plays soccer.

soccer hypernym sport play (VOS Repository) 1. participate in games or sport , , 2. perform music on (a musical instrument) 3. bet or wager (money) 4. put (a card or piece) into play during a game, or act strategically as if in a card game , 5. engage in an activity as if it were a game rather than take it seriously 17/30 Results Dataset: SemEval-2007 coarse-grained disambiguation. 400+ verbs Methodology: Recognition + Pruning + Standard WSD (we tried 5) Entry recognition: 2 incorrect due to dp errors (25 multi-word verbs)

DKPro- ExtSimpleLesk (+ MFS) F1 Gold Entries 76.26 + Syntactic Pruning 78.56 + 2.30 + Semantic Pruning 81.18 + 2.62 DKPro- ExtSimpleLesk (+ MFS) F1 Gold Entries 76.26 + Entry Recognition

73.44 - 2.82 + Syntactic Pruning 75.82 + 2.38 + Semantic Pruning 78.52 + 2.70 18/30 Outline 1. ClausIE 2. Werdy 3. FINET 4. Conclusion 19/30 Named Entity Typing Goal: Detect types of named entities in a given context wrt a type system (e.g. WordNet)

Page plays his guitar on stage. Supervised and semi-supervised systems Supervised systems (Require manual labeled data) Semi-supervised (Generate automatic training data from Wikipedia and a KB) Klitschko is the mayor of Kiev Klitschko is known for his powerful punches Context-oblivious types politician, boxer, mayor, expatriate, man, FINET is an unsupervised context-aware and super fine grained system (more than 16K for PER, LOC, ORG) 20/30 Context-Aware Overview of FINET Typing 1. Very Explicit: Steinmeier, the German Foreign Minister, .. stop? 2. Explicit: Imperial College London is in South Kensighton 3. Almost Explicit: Messi plays soccer stop?

stop? 4. Implicit: Pavano never made it to the mound stop? Final. Type Selection: What is the best type given the context? 21/30 1. Pattern-based extractor for very explicit types Principle: Extract explicitly mentioned types in the clause Methodology: Use syntactic and regex patterns (13 patterns) Barack Obama, the president of the US, plans to visit Cuba. appos mod NAMED ENTITY , (modifier) TYPE mod (modifier) Imperial College London is located in South Kensighton (?) 22/30

2. Mention-based extractor for explicit types Principle: Extract type embeded in entity mention Methodology: Look for types in the mention itself Imperial College London is located in South Kensighton Messi plays soccer (?) 23/30 3. Verb-based extractor for almost explicit types Principle: A verb determines the semantic type of its arguments Methodology: Transform verb into noun via suffixes Messi plays soccer play player soccer player Maradona expects to win in South Africa (?) 24/30

4. Corpus-based extractor for implicit types Principle: Distributional hypothesis (similar entities -- similar context) Methodology: Use word vectors to collect entities matching a given context word2vec: Returns k more similar phrases to set of phrases Maradona expects to win in South Africa query: Maradona, South Africa Mention Type Diego Maradona" , .. Parreira" , .. Carlos Alberto Parreira" , .. Dunga" , ..

Parreira coached Brazil in South Africa Dunga replaced Parreira after South Africa 25/30 Final. Type detection via Word Sense Disambiguation Goal: Given a set of candidates determine which one best fits the context Context: Entity-oblivious: All words in the sentence Entity-specific: Entity related words from word vectors Maradona expects to win in South Africa Entity-oblivious: Maradona, expects, win, South Africa Entity-specific: coach, cup, striker, mid-fielder, and captain Method: Estimate probability of context in senses (Naive Bayes) - : sport, train, athlete, team, coach, manager, handler, trainer, : athlete, soccer, player, train, compete, sport, hockey, 26/30 Results System Coarse-grained (Stanford NER)

Super-fine grained Fine-grained P Correct Types P Correct Types P Correct Types FINET 87.90 872 72.42

457 70.82 233 HYENA 72.40 779 28.26 522 20.65 160 System Distinct Types Avg. Depth Fine-grained Super-fine grained

FINET 191 5.96 7.25 HYENA 127 5.79 6.98 NYT dataset (500 sentences) 27/30 Outline 1. ClausIE 2. Werdy 3. FINET 4. Conclusion 28/30 Conclusion

Methods mostly unsupervised Domain Independent Linguistically based ClausIE (WWW, 2013) [OIE] Understands the structure of the information Initial support for non-verb mediated relations Flexible process of representation Werdy (EMNLP, 2014) [WERD] Provides method for multi-word expression recognition Explicitly incorporates the idea of the syntactic and semantic dependence of verbs wrt context FINET (EMNLP, 2015) [Entity typing] Context-Aware, tailored for short inputs Super- fine grained 29/30 Thank you! 30/30

Recently Viewed Presentations

  • How Can I Be Sure?

    How Can I Be Sure?

    Mae posib camliwio'r gwir e.e. "Dydy'r hyn sy'n wir i ti ddim yn wir i mi". Ffeithiau Pwysigrwydd addewidion ac awdurdod yr ysgrythur. Hanes Job, a'i sicrwydd fod Duw yn dal yno pan oedd ei ysbryd yn isel. Canlyniad negyddol...
  • K.A.P STUDY ON HT&DM - University of Pittsburgh

    K.A.P STUDY ON HT&DM - University of Pittsburgh

    The respondents were tested on 6 aspects of knowledge on diabetes - what they understand by the disease, symptoms, complications, prevention, diet and exercise. 50% to 60% of the total 175 patients are having the adequate knowledge about HT and...
  • El verbo SER - Español con la Sra. Reid

    El verbo SER - Español con la Sra. Reid

    Marcos de Argentina. Yo de Bogotá, Colombia. Vosotros costarricenses. es es soy sois Función : Profesión ~La Sra. Reid profesora de español. ~Yo un policía. ~Laura y Juan dentistas muy buenos. es soy son Función: La hora - la fecha...
  • Plate Tectonics - newrichmond.k12.wi.us

    Plate Tectonics - newrichmond.k12.wi.us

    Plate Tectonics Breakdown of the Earth Composition Crust Mantle Core Structure Lithosphere Asthenosphere Mesosphere Outer Core Inner Core Crust Crust Outer most layer 5 - 100 km thick Less than 1% of Earth's mass Continental Crust Minerals similar to granite,...
  • Biodiversity in Minnesota

    Biodiversity in Minnesota

    Biodiversity in Minnesota Ring-Necked Pheasant Bison Bowfin Spotted Salamander Shag Bark Hickory Wild Licorice By Bryce Woitas Male- adults are medium sized chicken like birds Long pointed tails Wings long Face is red and bare Head has iridescent green on...
  • No Product? No Program.

    No Product? No Program.

    Outils de Planification des Approvisionnements Séminaire Technique de Préparation des Consultants pour les Programmes VIH, TB et Paludisme Dakar 6-9 Mars 2006
  • The Mediterranean & The Middle East, 2000-500 B.C.E.

    The Mediterranean & The Middle East, 2000-500 B.C.E.

    The Mediterranean & The Middle East, 2000-500 B.C.E. John Ermer. World History . Miami Beach Senior High School. The Iron Age. The Mediterranean Sea provides vehicle for cultural ex. New cultures and civilizations emerge in Middle East. Interactions b/w civilizations...
  • Fall 2008 The Chinese University of Hong Kong

    Fall 2008 The Chinese University of Hong Kong

    Last time we saw that In fact, it is also true that There is no DFA with 3 states for L This is the unique 4-state DFA for L 1 q0 q1 q2 q3 1 1 1 0 0 0...