DeepPath: A Reinforcement Learning Method for Knowledge Graph

DeepPath: A Reinforcement Learning Method for Knowledge Graph

DeepPath: A Reinforcement Learning Method for Knowledge Graph Reasoning Wenhan Xiong, Thien Hoang, William Wang Department of Computer Science UC Santa Barbara 1 Knowledge Graphs English Sp ok en I personLanguages es ag Neal

McDonough nationality-1 United States countryOfOrigin castActor Band of Brothers Actor gu co un try an nL rso

Caesars Entertain serviceLocation-1 pe n -1 serviceLanguage profession Tom Hanks awardWorkWinner 2 Reasoning on Knowledge Graph n -1 Sp

ok en I personLanguages es ag Neal McDonough nationality-1 United States countryOfOrigin castActor Band of Brothers Actor

gu co un try an nL rso Caesars Entertain serviceLocation-1 pe Query Node: Band of Brothers English serviceLanguage Query Relation: tvProgramLanguage Query: tvProgramLanguage(Band of Brothers, ?) profession

Tom Hanks awardWorkWinner 3 Related Work Path-based methods Path-Ranking Algorithm (PRA), Lao et al. 2011 Subgraph Feature Extraction, Gardner et al, 2015 RNN + PRA, Neelakantan et al, 2015 Chains of Reasoning, Das et al, 2017 4 Related Work Embedding-based method

TransE, Bordes et al, 2013 Neural Tensor Network, Socher et al, 2013 TransR/CTransR, Lin et al, 2015 Complex Embeddings, Trouillon et al, 2016 5 Our Approach Learning the paths instead of using random walks Model the path finding as a MDP Train a RL agent to find paths Represent the KG with pretrained KG embeddings Use the learned paths as horn clauses 6 Reinforcement Learning Agent

: Environment Agent Multi-layer neural nets (st ) Environment KG modeled as a MDP 1. Richard Sutton, Reinforcement Learning An Introduction 7 Components of MDP Markov decision process continuous states represented with KG embeddings : action space transition probability reward received for each taken action

With pretrained KG embeddings , all relations in the KG 8 Framework English oke nIn -1 ryS p cou nt United States Neal McDonough Actor

Reward ReLU profession Reason Step nationality-1 castActor Tom Hanks countryOfOrigin Band of Brothers State ReLU es

ag gu serviceLocation-1 personLanguages Next State Lan on Caesars Entertain rs pe serviceLanguage awardWorkWinner Softmax

(a| s) 9 Reward Functions Global Accuracy Path Efficiency Path Diversity 10 Training with Policy Gradient Monte-Carlo Policy Gradient (REINFORCE, William, 1992) 11 Challenge Typical RL problems Atari games (Mnih et al., 2015): 4~18 valid actions

AlphaGo (Silver et al. 2016): ~250 valid actions Knowledge graph reasoning: >= 400 actions Issue: large action (search) space -> poor convergence properties 12 Supervised (Imitation) Policy Learning Use randomized BFS to retrieve a few paths Do imitation learning using the retrieved paths All the paths are assigned with +1 reward 13 Datasets and Preprocessing Dataset # of Entities # of Relations

# of Triples # of Tasks FB15k-2371 14,505 237 310,116 20 NELL-995 75,492 200 154,213 12

FB15k-237: Constructed from FB15k (Bordes et al., 2013), redundant relations removed NELL-995: Constructed from the 995th iteration of NELL system (Carlson et al., 2010b) Dataset processing Remove useless relations: haswikipediaurl, generalizations, etc Add inverse relation links to the knowledge graph Remove the triples with task relations 1 . Toutanova et al. Representing text for joint embedding of text and knowledge bases 14 Effect of Supervised Policy Learning x-axis: number of training episodes y-axis: success rate calculated on a held-out test set -> Re-train the agent using reward functions 15

Inference Using Learned Paths Path as logical formula (horn-clauses) FilmCountry: actorFilm-1 -> personNationality PersonNationality: placeOfBirth -> locationContains-1 etc Bi-directional path-constrained search Check whether the formulas hold for entity pairs Uni-directional search bi-directional search 16 Link Prediction Result Tasks

PRA Ours TransE TransR worksFor 0.681 0.711 0.677 0.692 atheletPlaysForTeam 0.987 0.955

0.896 0.784 athletePlaysInLeague 0.841 0.960 0.773 0.912 athleteHomeStadium 0.859 0.890 0.718

0.722 teamPlaysSports 0.791 0.738 0.761 0.814 orgHirePerson 0.599 0.742 0.719 0.737 personLeadsOrg

0.700 0.795 0.751 0.772 0.675 0.796 0.737 0.789 Overall Mean average precision on NELL-995 17

Qualitative Analysis Path length distributions 18 Qualitative Analysis Example learned paths placeOfBirth -> locationContains -1 personNationality: peoplePlaceLived -> locationContains-1 peopleMariage -> locationOfCeremony -> locationContains-1 tvCountryOfOrigin -> countryOfficialLanguage tvProgramLanguage: tvCountryOfOrigin -> filmReleaseRegion-1 -> filmLanguage tvCastActor -> personLanguage 19

Conclusion and Future Work Conclusions Propose a RL framework for KG reasoning Controllable paths finder (walker) in KG Combine path-based and embedding-based methods Future Directions Adversarial learning to give rewards Joint reasoning with KG triples and text 20 Thanks! Code: https://github.com/xwhan/DeepPath Dataset: http://cs.ucsb.edu/~xwhan/datasets/NELL-995.zip 21 Bi-directional search Relation link belongs to the reasoning path Think about supernode which has neighbors linked by relation But only the path with can reach the tail entity

If we go from tail to head, we will encounter first, by relation link , is the only neighbor 22 Feature Engineering in PRA vs DRL Reward learning is different from feature engineering DRL enable learning on the symbolic space We can try to learn better reward functions in the future, which can further boost the performance 23

Recently Viewed Presentations

  • Locum - vts.wm.hee.nhs.uk

    Locum - vts.wm.hee.nhs.uk

    Locum jobs Review your needs & be clear with surgeries Honesty & commitment Be organised Contract or not Now Folder & Stationery CV GMC Covering letter P60 Invoices Certificates Performers list - CCT, Indemnity, Copy for folder Speak to current...
  • Bacteria - Google

    Bacteria - Google

    Lab 6A Warm-Up What are plasmids? What gene(s) are on the plasmid being used in this lab? How can we know if cells have been successfully transformed in this lab? Bacteria Chapter 27.2 What you need to know: Mechanisms that...
  • Chapter 4 Authentication Applications Henric Johnson Blekinge Institute

    Chapter 4 Authentication Applications Henric Johnson Blekinge Institute

    Outline Security Concerns Kerberos X.509 Authentication Service Recommended reading and Web Sites Security Concerns key concerns are confidentiality and timeliness to provide confidentiality must encrypt identification and session key info which requires the use of previously shared private or public...
  • Using Context to Infer meaning

    Using Context to Infer meaning

    Definition-the author explains the meaning of the word right in the sentence or selection Synonym-the author uses a similar word Antonym-the author uses a opposite word Example-the author provides an example General-the author provides several words or statements that give...
  • RTP/RTCP and RTSP multimedia streaming protocols for the Internet

    RTP/RTCP and RTSP multimedia streaming protocols for the Internet

    RTP/RTCP, RTSP, and RSVP Multimedia protocols for the Internet Jim Chou and Thinh Nguyen Outline of the presentation 1- the context 2- the RTP/RTCP protocols 3- the RTSP protocol 4- the RSVP protocol PART 1: The context A general view...
  • Présentation PowerPoint

    Présentation PowerPoint

    Energies fossiles/énergies renouvelables. Vers une sexualité responsable. Le changement climatique. STAGE 1. STAGE 2. STAGE 3. Author: Cécile GILLES Created Date: 08/24/2019 03:22:37 Title: Présentation PowerPoint Last modified by:
  • The Impact of Available Learnscapes on Teachers' Pedagogies ...

    The Impact of Available Learnscapes on Teachers' Pedagogies ...

    Learnscapes as Pedagogical Tools: Understanding Teachers' Levels of Use. Keith Skamp Centre for Children & Young People School of Education Southern Cross University
  • International Business by Daniels and Radebaugh Chapter 3

    International Business by Daniels and Radebaugh Chapter 3

    International Business by Daniels and Radebaugh Chapter 3 Political and Legal Environments Facing Business Objectives To discuss the different functions that political systems perform To compare democratic and totalitarian political regimes and discuss how they can influence managerial decisions To...