Division of semantic labor in the Global WordNet

Division of semantic labor in the Global WordNet

Division of semantic labor in the Global WordNet Grid Piek Vossen, VU University Amsterdam German Rigau, University of the Basque Country 5th Global Wordnet Conference Mumbai, India, Jan 30 Feb 5, 2010 Overview KYOTO as a domain implementation of the Global Wordnet Grid Scope of knowledge integration Division of linguistic labor How to integrate resources? How to make inferences? KYOTO some statistics European-Asian project March 2008 March 2011 7 countries (The Netherlands, Italy, Germany, Spain, Taiwan, Japan, Czech Republic) 12 sites Universities & research institutes: VUA, CNR-ILC, CNR-IIT,

BBAW, EHU, AS, NICT, Masaryk Companies: Synthema, Irion User organizations: ECNC, WWF 7 languages (English, Italian, Japanese, Dutch, Spanish, Basque, Chinese) KYOTO Overall architecture Overview of the KYOTO process W o r d n e t s & O n t o lo g y L i n g u i s tic P ro ce sso r M u l til i n g u a l K n o w le d g e B a s e S e m a n tic & S y n t a c tic r e p r e s e n t a tio n K y o t o A n n o t a tio n F o r m a t K ybot F a c t E x tr a c to r

2 W ik y o to W ik i E d ito r 1 Tybot Fa ct B a se T e r m E x tr a c to r T e rm B a se Applying ontology mappings GWC2010, Mumbai 5 6 Gobal Wordnet Grid Domain

Domain Wn Wn Domain Wn Domain Domain Domain Base concepts DOLCE/SUMO Ontology OntoWordnet Wn Wn

Domain Domain Domain Wn Wn V GWC2010, Mumbai Available repositories in KYOTO Environment domain Term database: 500,000 terms per 1,000 documents per language Open data project: DBPedia: 2.6 million things, including at least 213,000 persons, 328,000

places, 57,000 music albums, 36,000 films, 20,000 companies. The knowledge base consists of 274 million pieces of information (RDF triples) GeoNames: 8 million geographical names and consists of 6.5 million unique features whereof 2.2 million populated places and 1.8 million alternate names Domain thesauri and taxonomies: Species 2000: 2,1 million species Wordnets for 7 languages: about 50,000 to 120,000 synsets per language Ontologies: SUMO, DOLCE, SIMPLE GWC2010, Mumbai 7 Kyoto Knowledge Base 8 500K V Domain T

T Domain T 2,100K Domain Terms Wn Wn Species Wn 500K Domain

T Domain Domain Base concepts DOLCE/SUMO Ontology OntoWordnet Wn Wn Domain V T Domain Wn

Wn V GWC2010, Mumbai DBPedia 2,100K Domain V Terms Species T 9 Species in the ontology - Implies to store 2.1 million species twice in the ontology. GWC2010, Mumbai

10 Should all knowledge be stored in the central ontology? Vocabularies are too large for full inferencing with current reasoners Vocabularies are linguistically too diverse to be represented in an ontology Inferencing capabilities of formal ontologies is not needed for all levels of knowledge GWC2010, Mumbai 11 Modeling knowledge in a domain Knowledge needs to be divided over different lexical and ontological layers: Precisely define the relations between lexical and ontological layers Precisely define the inferencing based on the

distributed knowledge layers GWC2010, Mumbai 12 Division of linguistic labor principle Putnam 1975: No need to know all the necessary and sufficient properties to determine if something is "gold" Assume that there is a way to determine these properties and that domain experts know how to recognize instances of these concepts. Speakers can still use the word "gold" and communicate useful information GWC2010, Mumbai 13 Division of semantic labor principle Digital version of Putnam (1975): Computer does not need to have all the necessary and

sufficient properties to determine if something is a "European tree frog" Computer assumes that there is a way to determine this and that domain experts (people) know how to recognize instances of these concepts. Computers can still reason with semantics and do useful stuff with textual data GWC2010, Mumbai 14 What does the computer need to know? Distinction between rigid and non-rigid (Welty & Guarino 2002): being a "cat" is essential to individual's existence and therefore rigid being a "pet" is a temporarily role and therefore nonrigid; a cat can become a pet and stop being a pet without ceasing to exist Felix is born as a cat and will always be a cat, but during some period Felix can become a pet and stop being a pet while he continuous to exist as a cat

All 2.1 million species are rigid concepts GWC2010, Mumbai 15 What does the computer need to know? Roles and processes in documents have more information value than the defining properties of species: Species defined in terms of physical properties already known to expert; Roles such as "invasive species", "migration species", "threatened species" express THE important properties of instances of species Roles are typically the terms we learn from the text not the species! GWC2010, Mumbai 16 Wordnet-ontology-relations

Rigid synset relations to ontology: Non-rigid synset relations to ontology: Synset:Endurant(Object); Synset:Perdurant(Event); Synset:Quality: sc_equivalenceOf (= relation in WN-SUMO) or sc_subclassOf (+ relation in WN-SUMO) Synset:Role; Synset:Endurant(Object); Synset:Perdurant(Event) sc_domainOf: range of ontology types that restricts a role sc_playRole: role that is being played sc_participantOf: the process in wich the role is played

Rigidity can be detected automatically (Rudify, 80% precision, IAG 80%) and is stored in wordnets as attributes to synsets GWC2010, Mumbai Global Wordnet Grid Model English Wordnet in WN-LMF bird_1_N KYOTO Ontology in OWL-DL (Extension of DOLCE LT) sc_equivalentOf bird rigid migration_bird_1_N non-rigid hyponym sc_domainOf bird sc_playRole done-by

sc_participantOf migration duck_1_N, rigid migration_4_N migrate_1 _V sc_equivalentOf migration sc_equivalentOf migration perdurant subclass change-of-location endurant subclass role object subclass subclass

organism done-by has-path has-destination subclass migration has-source has has bird subclass some Global Wordnet Grid Model KYOTO Ontology in OWL-DL (Extension of DOLCE LT) English Wordnet in WN-LMF bird_1_N sc_equivalentOf bird

rigid migration_bird_1_N non-rigid sc_domainOf bird sc_playRole done-by sc_participantOf migration duck_1_N, rigid migration_4_N migrate_1 _V sc_equivalentOf migration sc_equivalentOf migration subclass change-of-location Spanish Wn, Basque Wn Italian Wn, Japanese Wn Chinese Wn ....

subclass role object subclass subclass organism done-by has-path has-destination subclass migration has-source has has bird subclass Dutch Wordnet

migrerende dieren_1_N (migrating species) non-rigid equivalent_hypernym eend_1_N (duck) equivalent endurant perdurant sc_domainOf organism sc_playRole done-by sc_participantOf migration eng-30-02356039-n (bird) eng-30-01254614-n (duck) Cross-lingual Cross-lingual equivalence equivalence mappings mappings are are expressed

expressed through through wordnet wordnet mappings mappings some Wordnet to ontology mappings {create, produce, make}Verb, English -> sc_ equivalenceOf construction {artifact, artefact}Noun, English -> sc_domainOf physical_object -> sc_playRole result-existence -> sc_participantOf construction {kunststof}Noun, Dutch // lit. artifact substance -> sc_domainOf amount_of_matter -> sc_playRole result-existence -> sc_participantOf construction {meat}Noun, English -> sc_domainOf cow, sheep, pig

-> sc_playRole patient -> sc_participantOf eat { , , }Noun, Chinese -> sc_domainOf animal -> sc_playRole patient -> sc_participantOf eat { ,, ,, , } Noun, Arabic -> sc_domainOf cow, sheep -> sc_playRole patient -> sc_participantOf eat Wordnet to ontology mappings {teacher}Noun, English -> sc_domainOf human -> sc_playRole done-by -> sc_participantOf teach {leraar}Noun, Dutch // lit. male teacher -> sc_domainOf man -> sc_playRole done-by -> sc_participantOf teach {lerares}Noun, Dutch // lit. female teacher -> sc_domainOf woman

-> sc_playRole done-by -> sc_participantOf teach Wordnet-LMF WN-LMF Synset relations

WN-LMF Synset relations

24 Division of labor in knowledge sources Skos database 2.1 million species Wordnet-LMF 100,000 synsets animal:1 Base Concept Animalia Chordata chordate:1 Amphibia

vertebrate:1,craniate:1 Ontology-OWL-DL 2,000 types endurant perdurant physical-object endanger organism Anura amphibian:3 Leptodactylidae frog:1, toad:1, toad frog:1, anuran:1, batrachian:1, salientian:1 Eleutherodactylus Eleutherodactylus atrabracus

Eleutherodactylus augusti barking frog GWC2010, Mumbai Term database 500,000 terms endemic frog endangered frog poisonous frog alien frog 25 How to make inferences? Sparql queries to large Virtuoso databases: Aligned Species 2000, DBPedia Sql queries to term database Graph matching on wordnets stored in DebVisDic

Reasoning on a small ontology GWC2010, Mumbai 26 Ontotagger applied to KAF Apply WSD to every term in the KAF representation of a text For each term in KAF representation of a text: (a)If wordnet synset (WSD) then check for ontology mappings, if none traverse wordnet hierarchy to find first mapping (b)Else check the SKOS database for wordnet mapping, if necessary traverse broader relations up to the first wordnet mapping and go to a.) (c)Else check the term database for wordnet mappings, if necessary traverse parent relations up to the first wordnet mapping and go to a.) Collect all mappings from the ontology and all (relevant) ontological

implications and insert them into the KAF representation of the text. KYOTO Project meeting, Jan 13-14th 2010, PolyU Hong Kong 27 Examples 1. 2. 3. 4. Migration birds in the Humber Estuary. The migration of birds to the Humber Estuary Bird migration in the Humber Estuary Birds that migrate to the Humber Estuary KYOTO Project meeting, Jan 13-14th 2010, PolyU Hong Kong Annotation of ontological implications in KAF

Annotation of ontological implications in KAF

31 Kybot profiles IF T1 + to + T2 & T1.impliedType="change_of_location" & T1.impliedRole="has-target" & T2.Type="location" THEN IF T1 + from + T2 & T1.impliedType="change_of_location" & T1.impliedRole="has-source" & T2.Type="location" THEN KYOTO Project meeting, Jan 13-14th 2010, PolyU Hong Kong

Kybot Knowledge Patterns 33 Conclusion: Should all knowledge be stored in the central ontology? Vocabularies are too large for full inferencing Vocabularies are linguistically too diverse to be represented in an ontology

Inferencing capabilities of formal ontologies is not needed for all levels of knowledge A model of division of labor (along the lines of Putnam 1975) in which knowledge is stored in 3 layers: SKOS vocabularies and term databases wordnet (WN-LMF) ontology (OWL-DL), Each layer supports different types of inferencing ranging from Sparql queries, graph algorithms to reasoning. Mapping relations that support the division of labour and different types of inferencing and that allow for the encoding of languagespecific lexicalizations and restrictions. GWC2010, Mumbai Conclusions Ontologies are abstract and minimal and lexicons are large and rich Semantic relations in lexicons are complementary to ontological relations Semantic relations expressed in a language system should be compatible with ontologies Large vocabularies of types (rigid things in the world) can be mapped to the ontology through combinations of lexical relations and basic ontological mappings

Lexicalizations of contextual and subjective concepts need to be expressed through more complex relations Equivalences across languages partially through ontological expressions and partially across lexicons Applying WSD to terms

Recently Viewed Presentations

  • Survey Findings Overview  Purpose  Methodology 2  Purpose  [Report

    Survey Findings Overview Purpose Methodology 2 Purpose [Report

    [Report of the purpose of the survey and the questions you wanted to answer.] Methodology [Report on the timeframe, response rate, method for advertising the survey, and possible influences on the data. Consider including a screenshot of the survey on...
  • Formation of the Solar System and Other Planetary

    Formation of the Solar System and Other Planetary

    * Pluto (dwarf planet) - Not terrestrial nor Jovian Pluto is a special case Smaller than any of the terrestrial planets Intermediate average density of about 1900 kg/m3 Density suggests it is composed of a mixture of ice and rock...
  • Medieval period "middle ages" "dark ages"

    Medieval period "middle ages" "dark ages"

    Medieval period"middle ages""dark ages" ... Written in late 14th century…near the beginning of the renaissance sweeping Europe. Written in Middle English…in the Vernacular. It took him "off and on" 13 years, but he never finished it.
  • Joint Autism Strategy SEND 2020 March 2016 Update

    Joint Autism Strategy SEND 2020 March 2016 Update

    Creating a new vision for Surrey CAMHS. Mandy Dunn. Co-Director, Children and Young People's services. We are delighted to have been awarded the contract to provide mental health and learning disabilities for children and young people across Surrey from 1...
  • Plant Diseases and Insect Pests  Disease - any

    Plant Diseases and Insect Pests Disease - any

    Plant Diseases and Insect Pests Parasite - an organism that derives nourishment from another living organism Host - an organism that provides nourishment to another Obligate parasite - organism that can only survive on the living host Host range -...
  • Implicit approximation can be solved using:  Point iteration

    Implicit approximation can be solved using: Point iteration

    See W&A, p. 95. All known terms are on the RHS; all unknown terms are on the LHS. Let x = y = a 2D The motivation behind the Alternating Direction Implicit Procedure is to keep the coefficient matrix tridiagonal...
  • Living - wwwwwfse.cdn.triggerfish.cloud

    Living - wwwwwfse.cdn.triggerfish.cloud

    Living Planet Report dokumenterar planetens tillstånd ̶ trycket på naturresurserna och förändringar i den biologiska mångfalden ̶ samt diskuterar vad detta betyder för mänskligheten. Det tryck mänskligheten utövar på naturresurserna beskrivs i rapporten med hjälp av begreppet ...
  • Dental Shop Safety College of Dental Medicine Occupational

    Dental Shop Safety College of Dental Medicine Occupational

    Shops are present on all campuses of Columbia University. College of Dental Medicine casting lab has various types of machines for teaching and training students. This training is intended as an introduction to general shop safety.