Division of semantic labor in the Global WordNet Grid Piek Vossen, VU University Amsterdam German Rigau, University of the Basque Country 5th Global Wordnet Conference Mumbai, India, Jan 30 Feb 5, 2010 Overview KYOTO as a domain implementation of the Global Wordnet Grid Scope of knowledge integration Division of linguistic labor How to integrate resources? How to make inferences? KYOTO some statistics European-Asian project March 2008 March 2011 7 countries (The Netherlands, Italy, Germany, Spain, Taiwan, Japan, Czech Republic) 12 sites Universities & research institutes: VUA, CNR-ILC, CNR-IIT,
BBAW, EHU, AS, NICT, Masaryk Companies: Synthema, Irion User organizations: ECNC, WWF 7 languages (English, Italian, Japanese, Dutch, Spanish, Basque, Chinese) KYOTO Overall architecture Overview of the KYOTO process W o r d n e t s & O n t o lo g y L i n g u i s tic P ro ce sso r M u l til i n g u a l K n o w le d g e B a s e S e m a n tic & S y n t a c tic r e p r e s e n t a tio n K y o t o A n n o t a tio n F o r m a t K ybot F a c t E x tr a c to r
2 W ik y o to W ik i E d ito r 1 Tybot Fa ct B a se T e r m E x tr a c to r T e rm B a se Applying ontology mappings GWC2010, Mumbai 5 6 Gobal Wordnet Grid Domain
Domain Domain Domain Wn Wn V GWC2010, Mumbai Available repositories in KYOTO Environment domain Term database: 500,000 terms per 1,000 documents per language Open data project: DBPedia: 2.6 million things, including at least 213,000 persons, 328,000
places, 57,000 music albums, 36,000 films, 20,000 companies. The knowledge base consists of 274 million pieces of information (RDF triples) GeoNames: 8 million geographical names and consists of 6.5 million unique features whereof 2.2 million populated places and 1.8 million alternate names Domain thesauri and taxonomies: Species 2000: 2,1 million species Wordnets for 7 languages: about 50,000 to 120,000 synsets per language Ontologies: SUMO, DOLCE, SIMPLE GWC2010, Mumbai 7 Kyoto Knowledge Base 8 500K V Domain T
T Domain T 2,100K Domain Terms Wn Wn Species Wn 500K Domain
T Domain Domain Base concepts DOLCE/SUMO Ontology OntoWordnet Wn Wn Domain V T Domain Wn
Wn V GWC2010, Mumbai DBPedia 2,100K Domain V Terms Species T 9 Species in the ontology - Implies to store 2.1 million species twice in the ontology. GWC2010, Mumbai
10 Should all knowledge be stored in the central ontology? Vocabularies are too large for full inferencing with current reasoners Vocabularies are linguistically too diverse to be represented in an ontology Inferencing capabilities of formal ontologies is not needed for all levels of knowledge GWC2010, Mumbai 11 Modeling knowledge in a domain Knowledge needs to be divided over different lexical and ontological layers: Precisely define the relations between lexical and ontological layers Precisely define the inferencing based on the
distributed knowledge layers GWC2010, Mumbai 12 Division of linguistic labor principle Putnam 1975: No need to know all the necessary and sufficient properties to determine if something is "gold" Assume that there is a way to determine these properties and that domain experts know how to recognize instances of these concepts. Speakers can still use the word "gold" and communicate useful information GWC2010, Mumbai 13 Division of semantic labor principle Digital version of Putnam (1975): Computer does not need to have all the necessary and
sufficient properties to determine if something is a "European tree frog" Computer assumes that there is a way to determine this and that domain experts (people) know how to recognize instances of these concepts. Computers can still reason with semantics and do useful stuff with textual data GWC2010, Mumbai 14 What does the computer need to know? Distinction between rigid and non-rigid (Welty & Guarino 2002): being a "cat" is essential to individual's existence and therefore rigid being a "pet" is a temporarily role and therefore nonrigid; a cat can become a pet and stop being a pet without ceasing to exist Felix is born as a cat and will always be a cat, but during some period Felix can become a pet and stop being a pet while he continuous to exist as a cat
All 2.1 million species are rigid concepts GWC2010, Mumbai 15 What does the computer need to know? Roles and processes in documents have more information value than the defining properties of species: Species defined in terms of physical properties already known to expert; Roles such as "invasive species", "migration species", "threatened species" express THE important properties of instances of species Roles are typically the terms we learn from the text not the species! GWC2010, Mumbai 16 Wordnet-ontology-relations
Rigid synset relations to ontology: Non-rigid synset relations to ontology: Synset:Endurant(Object); Synset:Perdurant(Event); Synset:Quality: sc_equivalenceOf (= relation in WN-SUMO) or sc_subclassOf (+ relation in WN-SUMO) Synset:Role; Synset:Endurant(Object); Synset:Perdurant(Event) sc_domainOf: range of ontology types that restricts a role sc_playRole: role that is being played sc_participantOf: the process in wich the role is played
Rigidity can be detected automatically (Rudify, 80% precision, IAG 80%) and is stored in wordnets as attributes to synsets GWC2010, Mumbai Global Wordnet Grid Model English Wordnet in WN-LMF bird_1_N KYOTO Ontology in OWL-DL (Extension of DOLCE LT) sc_equivalentOf bird rigid migration_bird_1_N non-rigid hyponym sc_domainOf bird sc_playRole done-by
organism done-by has-path has-destination subclass migration has-source has has bird subclass some Global Wordnet Grid Model KYOTO Ontology in OWL-DL (Extension of DOLCE LT) English Wordnet in WN-LMF bird_1_N sc_equivalentOf bird
Eleutherodactylus augusti barking frog GWC2010, Mumbai Term database 500,000 terms endemic frog endangered frog poisonous frog alien frog 25 How to make inferences? Sparql queries to large Virtuoso databases: Aligned Species 2000, DBPedia Sql queries to term database Graph matching on wordnets stored in DebVisDic
Reasoning on a small ontology GWC2010, Mumbai 26 Ontotagger applied to KAF Apply WSD to every term in the KAF representation of a text For each term in KAF representation of a text: (a)If wordnet synset (WSD) then check for ontology mappings, if none traverse wordnet hierarchy to find first mapping (b)Else check the SKOS database for wordnet mapping, if necessary traverse broader relations up to the first wordnet mapping and go to a.) (c)Else check the term database for wordnet mappings, if necessary traverse parent relations up to the first wordnet mapping and go to a.) Collect all mappings from the ontology and all (relevant) ontological
implications and insert them into the KAF representation of the text. KYOTO Project meeting, Jan 13-14th 2010, PolyU Hong Kong 27 Examples 1. 2. 3. 4. Migration birds in the Humber Estuary. The migration of birds to the Humber Estuary Bird migration in the Humber Estuary Birds that migrate to the Humber Estuary KYOTO Project meeting, Jan 13-14th 2010, PolyU Hong Kong Annotation of ontological implications in KAF
Annotation of ontological implications in KAF
31 Kybot profiles IF T1 + to + T2 & T1.impliedType="change_of_location" & T1.impliedRole="has-target" & T2.Type="location" THEN IF T1 + from + T2 & T1.impliedType="change_of_location" & T1.impliedRole="has-source" & T2.Type="location" THEN KYOTO Project meeting, Jan 13-14th 2010, PolyU Hong Kong
Kybot Knowledge Patterns 33 Conclusion: Should all knowledge be stored in the central ontology? Vocabularies are too large for full inferencing Vocabularies are linguistically too diverse to be represented in an ontology
Inferencing capabilities of formal ontologies is not needed for all levels of knowledge A model of division of labor (along the lines of Putnam 1975) in which knowledge is stored in 3 layers: SKOS vocabularies and term databases wordnet (WN-LMF) ontology (OWL-DL), Each layer supports different types of inferencing ranging from Sparql queries, graph algorithms to reasoning. Mapping relations that support the division of labour and different types of inferencing and that allow for the encoding of languagespecific lexicalizations and restrictions. GWC2010, Mumbai Conclusions Ontologies are abstract and minimal and lexicons are large and rich Semantic relations in lexicons are complementary to ontological relations Semantic relations expressed in a language system should be compatible with ontologies Large vocabularies of types (rigid things in the world) can be mapped to the ontology through combinations of lexical relations and basic ontological mappings
Lexicalizations of contextual and subjective concepts need to be expressed through more complex relations Equivalences across languages partially through ontological expressions and partially across lexicons Applying WSD to terms
[Report of the purpose of the survey and the questions you wanted to answer.] Methodology [Report on the timeframe, response rate, method for advertising the survey, and possible influences on the data. Consider including a screenshot of the survey on...
* Pluto (dwarf planet) - Not terrestrial nor Jovian Pluto is a special case Smaller than any of the terrestrial planets Intermediate average density of about 1900 kg/m3 Density suggests it is composed of a mixture of ice and rock...
Medieval period"middle ages""dark ages" ... Written in late 14th century…near the beginning of the renaissance sweeping Europe. Written in Middle English…in the Vernacular. It took him "off and on" 13 years, but he never finished it.
Creating a new vision for Surrey CAMHS. Mandy Dunn. Co-Director, Children and Young People's services. We are delighted to have been awarded the contract to provide mental health and learning disabilities for children and young people across Surrey from 1...
Plant Diseases and Insect Pests Parasite - an organism that derives nourishment from another living organism Host - an organism that provides nourishment to another Obligate parasite - organism that can only survive on the living host Host range -...
See W&A, p. 95. All known terms are on the RHS; all unknown terms are on the LHS. Let x = y = a 2D The motivation behind the Alternating Direction Implicit Procedure is to keep the coefficient matrix tridiagonal...
Living Planet Report dokumenterar planetens tillstånd ̶ trycket på naturresurserna och förändringar i den biologiska mångfalden ̶ samt diskuterar vad detta betyder för mänskligheten. Det tryck mänskligheten utövar på naturresurserna beskrivs i rapporten med hjälp av begreppet ...
Shops are present on all campuses of Columbia University. College of Dental Medicine casting lab has various types of machines for teaching and training students. This training is intended as an introduction to general shop safety.
Ready to download the document? Go ahead and hit continue!