www.referent-tracking.com

www.referent-tracking.com

R T U New York State Center of Excellence in Bioinformatics & Life Sciences VUB Leerstoel 2009-2010 Theme: Ontology for Ontologies, theory and applications Ontologies and Natural Language Understanding May 20, 2010; 17h00-19h00 Vrije Universiteit Brussel, Pleinlaan 2, 1050 Brussels Room D2.01 Prof. Werner CEUSTERS, MD Ontology Research Group, Center of Excellence in Bioinformatics and Life Sciences and Department of Psychiatry, University at Buffalo, NY, USA R T U New York State Center of Excellence in Bioinformatics & Life Sciences

Context of this lecture series Knowledge Representation Informatics Linguistics Computational Linguistics Medical Natural Language Understanding Electronic Health Records Translational Research Medicine Biology Ontology Philosophy Realism-Based Ontology

Referent Tracking Pharmacogenomics Pharmacology Performing Arts Defense & Intelligence R T U New York State Center of Excellence in Bioinformatics & Life Sciences Todays topic Informatics Linguistics Computational Linguistics Medical Natural Language Understanding

Electronic Health Records Medicine May 20: ontologies and Natural Language Understanding Realism-Based Ontology R T U New York State Center of Excellence in Bioinformatics & Life Sciences R T U New York State Center of Excellence in Bioinformatics & Life Sciences Amazing technology A human being with function enhancing electronic implants

A tiny scanner capable of detecting bodily anomalies A doctor who is in fact some sort of computer program capable of making medical diagnoses Flawless communication between a human and a computer R T U New York State Center of Excellence in Bioinformatics & Life Sciences Or not amazing? towards a bionic eye

http://bionicvision.org.au/ R T U New York State Center of Excellence in Bioinformatics & Life Sciences Or not ? mobile diagnostics SilhouetteMobile GlucoPack scans and stores information about a wound's width and depth, which helps nurses track healing over time as new tissue fills in the injury reads and transmits glucose readings R T U New York State

Center of Excellence in Bioinformatics & Life Sciences Or not ? Transhumanism "Philosophies of life that seek the continuation and acceleration of the evolution of intelligent life beyond its currently human form and human limitations by means of science and technology, guided by life-promoting Max More principles and values." R T U New York State Center of Excellence in Bioinformatics & Life Sciences Beyond natural evolution

R T U New York State Center of Excellence in Bioinformatics & Life Sciences to mind uploading ? Ray Kurzweil receives National Medal of Technology (1999). R T U New York State Center of Excellence in Bioinformatics & Life Sciences But for today: How to communicate with computers naturally ? The supercomputer HAL from 2001: A Space Odyssey. R T U New York State

Center of Excellence in Bioinformatics & Life Sciences Michael Scotts solution http://aboulet.files.wordpress.com/2007/05/traveling-salesmen1.jpg R T U New York State Center of Excellence in Bioinformatics & Life Sciences Better: a combination of various technologies R T U New York State Center of Excellence in Bioinformatics & Life Sciences My interest in NLU: the medical informatics dogma

Fact: computers can only deal with a structured representation of reality: structured data: relational databases, spread sheets structured information: XML simulates context structured knowledge: rule-based knowledge systems Conclusion: a need for structured data entry R T U New York State Center of Excellence in Bioinformatics & Life Sciences Structured data entry Current technical solutions: rigid data entry forms coding and classification systems

But: the description of biological variability requires the flexibility of natural language and it is generally desirable not to interfere with the traditional manner of medical recording (Wiederhold, 1980) Initiatives to facilitate the entry of narrative data have focused on the control rather than the ease of data entry (Tanghe, 1997) R T U New York State Center of Excellence in Bioinformatics & Life Sciences Drawbacks of structured data entry Loss of information qualitatively limited expressiveness of coding and classification systems, controlled vocabularies, and traditional medical terminologies use of purpose oriented systems dont use data for another purpose than originally foreseen

quantitatively to time-consuming to code all information manually Speech recognition and structured data entry forms are not best friends R T U New York State Center of Excellence in Bioinformatics & Life Sciences The pilars of healthcare informatics Clinical language medical narrative Clinical terminologies coding and classification systems nomenclatures formal ontologies Electronic Health Record Systems

R T U New York State Center of Excellence in Bioinformatics & Life Sciences The possibilities Text based EHCRS able to generate structured data An EHCR exclusively build around a collection of coded data generated out of free text AAmultimedia multimediaEHCRS EHCRSwith withclinical clinicalnarrative narrative registrationand andstructured structureddata datageneration generation registration

A multimedia EHCRS with structured data entry and text generation An EHCR exclusively build around texts generated out of controled vocabularies An EHCR exclusively build around a collection of structured data able to generate text R T U New York State Center of Excellence in Bioinformatics & Life Sciences Main issues of MNLU Medical natural language understanding is: Making computers understand medical language Allowing computers to turn unstructured texts in structured information Medical NLU is NOT: medical reasoning performed by computers reducing the richness of clinical language to a closed set of codes

R T U New York State Center of Excellence in Bioinformatics & Life Sciences Typical examples of MNLU contextual spell checking information retrieval topic selection relevance ranking coding and classification software agents for clinical studies unstructured data registration for structured reporting R T U New York State Center of Excellence in Bioinformatics & Life Sciences Areas for application of MNLU

Coding patient data Structured information extraction from unstructured clinical notes Clinical protocols and guidelines Assessing patient eligibility for clinical trial entry Triggering and alerts Linking case descriptions to scientific literature Easy access to content ... towards a medical semantic web R T U New York State Center of Excellence in Bioinformatics & Life Sciences A wealth of communication related applications (1) Speech as input: voice recognition: who is the sender? speech recognition: dictation: what is the corresponding text?

irrespective of meaning command and control language learning (pronunciation checking) question answering spoken natural language understanding R T U New York State Center of Excellence in Bioinformatics & Life Sciences A wealth of communication related applications (2) Text as input:

speech generation (text-to-speech) spell checking grammar checking plagiarism detection indexing semantic indexing topic detection document retrieval return documents that tell me when Bonaparte was born information retrieval find in documents the date Bonaparte was born and return only the date clinical coding R T U New York State Speech generation (1) Center of Excellence in Bioinformatics & Life Sciences She lives near the highway where

three lives were lost. R T U New York State Speech generation (2) Center of Excellence in Bioinformatics & Life Sciences Chapter III is about Henry III. R T U New York State Center of Excellence in Bioinformatics & Life Sciences Text-to-speech basics http://upload.wikimedia.org/wikipedia/en/a/af/Festival_TTS_Telugu.jpg R T U New York State

Center of Excellence in Bioinformatics & Life Sciences Simple speech recognition algorithm raw speech signal analysis acoustic models sequential constraints train speech frames acoustic analysis

frame scores time alignment word sequence segmentation From the INRIA Parole project R T U New York State Center of Excellence in Bioinformatics & Life Sciences Dialogue systems with automatic translation http://www.oxygen.lcs.mit.edu/images/Speech.jpg R T U New York State

Center of Excellence in Bioinformatics & Life Sciences The disambiguation problem Some examples: lives: from to live or plural of life III: as three or the third bow: the weapon or from to bow Statistical models (n-grams): most often sufficient quite fast analysis Syntactic analysis Semantic analysis (deep or shallow) R T U New York State Center of Excellence in Bioinformatics & Life Sciences

A toy ontology for communication (1) Patterned particular (PP): piece of text: combination of characters sound wave series of signs in sign language, smoke combination and sequence of smells ? Some sender which generated a PP with the intention to provoke something in some receiver, the PP thus becoming a linguistic patterned particular (LPP) standard messages, questions, commands carry meaning directly encoded in the message poems, lies, deceptions, nonsense: no or partial directly encoded information Being a PP is not sufficient to be an LPP. There has to be a sender! a bird or insect flying in a pattern that looks like an LPP in some language

R T U New York State Center of Excellence in Bioinformatics & Life Sciences A toy ontology for communication (2) Aboutness relation from certain elementary LPPs to real world entities when created under certain circumstances me, I, mine current, president, United States, king, France Pattern types morphologic, syntactic, semantic and discourse conventions current President of the United States current king of France R T U New York State Center of Excellence in Bioinformatics & Life Sciences

A toy ontology for communication (3) Questionable entities: propositions sort of factual, linguistically undressed statements about the world bare meanings R T U New York State Center of Excellence in Bioinformatics & Life Sciences Text analysis The doctor checks Seven of Nines blood pressure R T U New York State Center of Excellence in Bioinformatics & Life Sciences

Syntactic analysis sentence verb phrase noun phrase noun phrase noun phrase det The noun verb doctor checks det the prepositional phrase compound noun prep person name blood pressure

of Seven of Nine R T U New York State Center of Excellence in Bioinformatics & Life Sciences Semantic analysis checking sentence verb phrase has-object has-agent noun phrase noun phrase noun phrase

det noun verb det person The doctor checks prepositional phrase compound noun prep person name clinical sign the blood pressure person

of Seven of Nine belongs-to R T U New York State Center of Excellence in Bioinformatics & Life Sciences The doctor uses an instrument sentence verb phrase checking agent instrument object noun phrase

det The noun noun phrase verb det doctor examines the noun phrase noun prep det noun patient

with a hammer R T U New York State Center of Excellence in Bioinformatics & Life Sciences Here the patient has the hammer ! sentence checking noun phrase verb phrase agent object

noun phrase det The noun prepositional phrase noun phrase verb det doctor examines the noun phrase noun prep det noun

patient with a hammer R T U New York State Center of Excellence in Bioinformatics & Life Sciences The problem of reference The surgeon examined Maria. She found a small tumor on the left side of her liver. She had it removed three weeks later. Ambiguities:

who denotes the first she: the surgeon or Maria ? on whose liver was the tumor found ? who denotes the second she: the surgeon or Maria ? what was removed: the tumor or the liver ? Here ontology can come to aid. R T U New York State Center of Excellence in Bioinformatics & Life Sciences Ontologies and NLP A two-way collaboration: using NLP techniques to assist the development of ontologies, using ontologies to make better NLP applications, bootstrapping: NLP applications that require ontologies in some stage and intend to make these ontologies better. R T U New York State

Center of Excellence in Bioinformatics & Life Sciences NLU as assistive technology for ontology development R T U New York State Center of Excellence in Bioinformatics & Life Sciences C-Tex: corpus-based term extraction Based on Deniz Yurets PhD thesis good news: (a particular) language independent automatic linguistic knowledge extractor relationships between words

grammar generation term extraction synonym / homonym detector (???) bad news: large corpora required (occ > 500 * different tokens) big PC required (3.000.000 words/day, DOS, PII-350) R T U New York State Center of Excellence in Bioinformatics & Life Sciences C-Tex: term extraction

TERM Occurrences (5679 reps) magnetic resonance 100 san francisco 12 invasive fungal sinusitis 7 rhinosinusitis disability index 3 intensive care unit 178 food allergy 31 th1 and th2 32 positron emission 29 R T U New York State Center of Excellence in Bioinformatics & Life Sciences C-Tex grammar induction

Sentence encountered: Sentence analyzed: R T U New York State Center of Excellence in Bioinformatics & Life Sciences C-Texs linguistic principles Words in natural language sentences: tend to collocate with a certain strength, are not linked in circular ways, have links that dont cross. R T U New York State Center of Excellence in Bioinformatics & Life Sciences C-Tex processing

s6 s5 s4 s3 s2 s1 I saw a man carry a telescope R T U New York State Center of Excellence in

Bioinformatics & Life Sciences C-Tex processing s6 s5 s4 s3 s1 I saw s7 s2 a man carry

a telescope R T U New York State Center of Excellence in Bioinformatics & Life Sciences C-Tex processing s6 s5 s4 s3 s1 I saw s8

s7 a man carry a telescope R T U New York State Center of Excellence in Bioinformatics & Life Sciences C-Tex processing s11 s10 s9

s1 I saw s8 s7 a man carry a telescope R T U New York State Center of Excellence in

Bioinformatics & Life Sciences C-Tex processing s11 s10 s9 s8 s1 I saw s12 s7 a man carry

a telescope R T U New York State Center of Excellence in Bioinformatics & Life Sciences C-Tex processing s11 s10 s9 s1 I saw s12

s7 a man carry a telescope R T U New York State Center of Excellence in Bioinformatics & Life Sciences C-Tex processing s11 s10 s9

s1 I saw s12 s7 a man carry a telescope R T U New York State Center of Excellence in

Bioinformatics & Life Sciences Advantages Defining the required coverage for a given domain, by listing the terms that need to receive a description in the ontology (= inverse annotation) listing the relationships that need to be named Catch up mechanism: things already done, dont need to be done again If a C-Tex without prior knowledge works fine, one with ontological knowledge should work even better Builds a grammar R T U New York State Center of Excellence in Bioinformatics & Life Sciences Drawbacks

very slow very sensitive to repeatedly seeing the same documents requires very careful training set development R T U New York State Center of Excellence in Bioinformatics & Life Sciences Gap Finder and Web Agent R T U New York State Center of Excellence in Bioinformatics & Life Sciences Domain specific word detection Indiana Irving JAMA Janus

Johannes Kanno Kd Kern Knowles L.M. LBF4-bind LBF6-binding LMP-1-express LMP-1-positive LPS LTR-Cat Laurent Lenny Leung Lewis Lim Listeria monocytogenes Indianapolis Ito Jaffe Japan

Johannsen Kaplan Keegan Kimble Ko LAV LBF4-binding LD LMP-1-induce LMP-1-transfect LT Laine Lechler Lenoir Levels Ley Lin Liu Inoue Iwanaga Jain Jk-bind Johnson

Karin Keller Kirsch Kozma LBF3-bind LBF5-and LFA LMP-1-mediate LN LTR Lane Lee Leonard Levine Li Ling Loisel Irani J Jama Jk-binding K Kaye

Kennedy Kishimoto L LBF3-binding LBF6-bind LMP LMP-1-negative LOH LTR-CAT Lanes Left Lett Levy Liebowitz Listeria London R T U New York State Kohonen Center of Excellence in Bioinformatics & Life Sciences clustering

R T U New York State Center of Excellence in Bioinformatics & Life Sciences Kohonen clustering R T U New York State Center of Excellence in Bioinformatics & Life Sciences Statistical relationship discovery context EU 6th VAT Directive EU 6th VAT Directive EU 6th VAT Directive EU 6th VAT Directive EU 6th VAT Directive EU 6th VAT Directive EU 6th VAT Directive

EU 6th VAT Directive EU 6th VAT Directive EU 6th VAT Directive EU 6th VAT Directive EU 6th VAT Directive EU 6th VAT Directive EU 6th VAT Directive EU 6th VAT Directive EU 6th VAT Directive EU 6th VAT Directive EU 6th VAT Directive EU 6th VAT Directive EU 6th VAT Directive EU 6th VAT Directive EU 6th VAT Directive EU 6th VAT Directive EU 6th VAT Directive EU 6th VAT Directive term member state condition criterion member state

member state member state member state member state member state accommodation committee service service supply of service accompany achieve acquire allow animal apply authorise authorise avoid breeding calculating role ACTOR-OF

ACT-UPON ACT-UPON ACT-UPON ACT-UPON ACT-UPON ACT-UPON ACT-UPON ACT-UPON CAUSED-BY CAUSED-BY CAUSED-BY CAUSED-BY CAUSED-BY HAS_ACTION HAS_ACTION HAS_ACTION HAS_ACTION HAS_ACTION HAS_ACTION HAS_ACTION HAS_ACTION HAS_ACTION HAS_ACTION HAS_ACTION

value term necessary measure purpose document amount condition method national currency of ecu period rules similar establishment commission agricultural holdings taxable person aim luggage exemption goods identification animals vat

member suspension fraud boars turnover R T U New York State Center of Excellence in Bioinformatics & Life Sciences The clique - approach a clique in an undirected graph is a subset of its vertices such that every two vertices in the subset are connected by an edge. A clique is maximal iff not part of a larger clique. R T U New York State Center of Excellence in

Bioinformatics & Life Sciences Building cliques out of n-grams Tony Veale. Categories, Cliques and Analogies in Creative Information/Knowledge Management. ICON 2009, Hyderabad, India. http://ltrc.iiit.ac.in/icon_archives/ICON2009/Presentations/Keynote/Categories%20and R T U New York State Center of Excellence in Bioinformatics & Life Sciences Sorts of cliques in linguistic corpora Tony Veale. Categories, Cliques and Analogies in Creative Information/Knowledge Management. ICON 2009, Hyderabad, India. http://ltrc.iiit.ac.in/icon_archives/ICON2009/Presentations/Keynote/Categories%20and R T U New York State Center of Excellence in Bioinformatics & Life Sciences

Category and hierarchy generation Tony Veale. Categories, Cliques and Analogies in Creative Information/Knowledge Management. ICON 2009, Hyderabad, India. http://ltrc.iiit.ac.in/icon_archives/ICON2009/Presentations/Keynote/Categories%20and R T U New York State Center of Excellence in Bioinformatics & Life Sciences Ontology to improve natural language understanding R T U New York State Center of Excellence in Bioinformatics & Life Sciences Understanding content (1) We see:

John Doe has a pyogenic granuloma of the left thumb The machine sees: R T U New York State Center of Excellence in Bioinformatics & Life Sciences

Understanding content (2) We see: The XML misunderstanding John Doe pyogenic granuloma of the left thumb The machine sees: <> < > <>

R T U New York State Center of Excellence in Bioinformatics & Life Sciences Requirements for NLU 1. Knowledge about terms and how they are used in valid constructions within natural language; 2. Knowledge about the world, i.e. how the referents denoted by the terms interrelate in reality and in given types of context; 3. An algorithm that : a. is able to calculate a language users representation of that part of the world described in the utterances that are the subject of the analysis. b. can track the ways in which people express what does NOT represent anything in reality (eg for medico-legal reasons) Only a realist ontology (and not an ontology that deals with alternative realities) permits correct disambiguation between 3a and 3b. R T U New York State

Center of Excellence in Bioinformatics & Life Sciences Exploit the relationships along the vertices Hallidays systemic functional grammar The structures of language are partially determined by our conceptualisation of the world. Halliday No mental representation without language Fodor Aristotelian realism concept Meaning is located in the interaction between living

beings and the environment language James J. Gibson, Ecological Realism in Psychology referents Baboons and humans have different cut-off points for discerning "same" objects because our verbal expression for "same" makes the idea of "same" more restrictive. Fagot and Wasserman (Centre for Research in Cognitive Neuroscience in Marseille) R T U New York State Center of Excellence in Bioinformatics & Life Sciences The content Language A Proprietary Terminologies

Language LexiconB Grammar Lexicon Others ... ICPC Grammar SNOMED Formal Domain Ontology ICD Linguistic Ontology MEDRA R T U New York State Use of spatial logics

Center of Excellence in Bioinformatics & Life Sciences HASOVERLAPPING -REGION HASPARTIALSPATIALOVERLAP ISSPATIAL -PARTOF ISPROPERSPAT.PARTOF HAS-DISCRETEDREGION HASSPATIAL -PART HASPROPERSPATIAL -PART HAS-SPATIALPOINTREFERENCE HASCONNECTINGREGION

HASDISCONNECTEDREGION HASEXTERNALIS-NONCONNECTINGTANG.ISREGION SPAT.TANG.IS- HAS-NON- HASPARTSPAT.- SPAT.- TANG.- TANG.OF PART- EQUIV.- SPAT.SPAT.OF OF PART PART ISIS-PARTLYIN-CONVEX- INSIDECONVEXISHULL-OF HULL-OF OUTSIDECONVEXHULL-OF ISIS-GEOINSIDE- TOPOINSIDEOF OF R T U New York State Center of Excellence in Bioinformatics & Life Sciences Example: (canonical) joint anatomy joint HAS-HOLE joint space

joint capsule IS-OUTER-LAYER-OF joint meniscus IS-INCOMPLETE-FILLER-OF joint space IS-TOPO-INSIDE joint capsule IS-NON-TANGENTIAL-MATERIAL-PART-OF joint joint IS-CONNECTOR-OF bone X IS-CONNECTOR-OF bone Y synovia IS-INCOMPLETE-FILLER-OF joint space synovial membrane IS-BONAFIDEBOUNDARY-OF joint space R T U New York State Center of Excellence in Bioinformatics & Life Sciences Linguistic, domain and BFO-based RUs Generalised Possession

Healthcare phenomenon Hassubclass-of Haspossessor 1 possessed Human being 1 2 subclass-of 1 Having a healthcare phenomenon 2 Is-possessor-of Patient subclass-of 3

Patient at risk Has-Healthcare3 phenomenon subclass-of Patient at risk for osteoporosis Is-RiskFactor-Of subclass-of Has-Healthcarephenomenon 4 4 Risk Factor subclass-of Risk factor for osteoporosis

Is-RiskFactor-Of subclass-of Osteoporosis R T U New York State Center of Excellence in Bioinformatics & Life Sciences Value of the three sorts of RUs Linguistic: capture the way language is used Domain: capture the way how domain experts conceptualize the domain is in part reflected by the way they talk about the domain BFO-based: capture how matters are believed to be, without referring to linguistic or domain RUs except when they

denote the same thing R T U New York State Center of Excellence in Bioinformatics & Life Sciences One should try to maximize the number of BFO-based Representational Units In this case: base RUs on the Ontology of General Medical Science healthcare phenomenon bodily feature ? risk factor disposition ? osteoporosis disorder, disease, path. process ? R T U New York State Center of Excellence in Bioinformatics & Life Sciences MNLU: the general idea Text

Result Keywords ICD-Codes Discharge letter MedLine abstracts English patient record French patient record Surgery report Protocol checking

R T U New York State Center of Excellence in Bioinformatics & Life Sciences MNLU: some requirements Processor Domain representation Text Result Goal representation R T U New York State Center of Excellence in Bioinformatics & Life Sciences

Linguistic Application Components Processor Domain representation Task Knowledge Goal representation Result on t ol og y Text Linguistic Knowledge R T U New York State Center of Excellence in

Bioinformatics & Life Sciences Implements Rectors Clean separation of knowledge Conceptual knowledge: the knowledge of sensible domain concepts Knowledge of definitions and criteria: how to determine if a concept applies to a particular instance Surface linguistic knowledge: how to express the concepts in any given language Knowledge of classification and coding systems: how an expression has been classified by such a system Pragmatic knowledge: what users usually say or think, what they consider important, how to integrate in software R T U New York State Center of Excellence in Bioinformatics & Life Sciences What does this mean for applications? Processor

Domain representation Result ON TO LO GY Text Linguistic Knowledge Discourse Linguistic Coding Task Information Knowledge rules Knowledge Goal representation

English Keywords Reports P.Rec Completeness French ICD-Codes P.Rec R T U New York State Center of Excellence in Bioinformatics & Life Sciences Hallidays systemic functional grammar A complete theory for NLU constructivistic basis: language construes human experience English: It is raining Chinese: The sky drops water

hence: natural languages are instances of generic schemes macro-structure of documents derive a structural formula micro-structure of documents lexical cohesion in-conjunction analysis R T U New York State Center of Excellence in Bioinformatics & Life Sciences General Principle of Semantic Mapping 1. Semantic constraints are associated with: a) Lexemes, or, b) Syntactic classes which generalize over lexemes. 2. A word inherits all constraints associated with each of the syntactic classes it instantiates, as well as any associated with the lexeme itself. 3. Where the lexicon provides multiple semantic

interpretations of a word, these are tried in order until one applies. (e.g., with can be interpreted as HAS_HC_PHENOMENON, HAS_INSTRUMENT, etc.) R T U New York State Lexicon-specified Mapping Center of Excellence in Bioinformatics & Life Sciences Lexsem rules fix the RU that a particular term can map to. lexsem e.g., lexsem present verb CONSULTATION_PROCESS The element defines the root form of the lexeme, so the

above example will also be applicable for presents and presenting. The element distinguishes cases of lexical ambiguity, e.g., present as a noun. Where a lexeme is polysemic, multiple lexsem entries are provided. In some cases, a lexeme provides not only a RU, but some structure as well, e.g., lexsem "since" preposition {} {Head.Sem.HAS-CEN-OCCURENCE-SINCE PPHead.Sem} (meaning: the concept expressed by the syntactic dominator of since is linked by a HAS-CENOCCURENCE-SINCE relation to the RU expressed by the NP following since) R T U New York State Center of Excellence in Bioinformatics & Life Sciences Syntax-specified Mapping Two reasons for associating mapping information on syntactic features: The syntactic feature represents a generalisation over a set of lexemes e.g., the syntactic feature human-surname contains the mapping information for all surnames).

The syntactic feature represents a syntactic configuration which itself implies meaning e.g., passive is not a feature of a word but of a configuration of words Syntactic constraints are of two types: Specify the class a particular role filler must have (whether syntactic element or conceptual): e.g., Sem.Actor: human (Sem.Actor is a role-chain, meaning the Actor slot of the Sem slot) Specify that the fillers of two role-chains are the same: e.g., Sem.Actor = Subj.Sem Logical combinations of syntactic constraints are possible: {and {Head.Sem: COMPLAINING_PROCESS} {Head.Sem.HAS_SAYING PPHead.Sem} } (or and not are also possible) R T U New York State of Excellence in RUsCenter involved

in analyzing Mr. Smith Bioinformatics & Life Sciences Material Entity human Is-assignedname-of Ontology male human name MrSmith Mr Smith

Is-assignedname-of Smith Instance Text R T U New York State Mr Center of Excellence in Smith analysed syntactically, and Bioinformatics & Life Sciences to drive mapping. features used female-titled

The Orth slot of a word gives its surface string. The append( ) operator joins together its arguments as a single string. HUMANNAME-TYPE Title: female-title Sem: female-human titled-human TITLEDHUMAN-TYPE Title: title Title -2 untitled-human human-name Sem: human HUMANNAME-TYPE4

human-surname male-titled Title: male-title Sem: male-human genderless-titled prenamed-provided human-firstname Prename: human-firstname HUMANNAME-TYPE3 Prename -1 Sem.Assigned_Name = append{Prenam.Orth, Orth} prename-not-provided Sem.Assigned_Name = Orth

R T U New York State Center of Excellence in analysis of an &83-year-old man Bioinformatics Life Sciences Dom-ent human age state HAS-WE-STATE human age P-TYPE human

Ontology male human X1 HAS-WE-STATE X2 P-TYPE X3 Instance Deict Epith An 83-year-old man

Syntax R T U New York State Center of Excellence in Bioinformatics & Life Sciences Syntactic-Semantic mapping Lexicon: lexsem man noun MALE_HUMAN lexsem $int$-year-old adjective HUMAN_AGE_STATE one of the constraints (shown in red) on the feature pre Syntax: qualified (which introduces the Epith role) fits: R T U New York State Center of Excellence in Bioinformatics & Life Sciences

Example of a bootstrapping approach R T U New York State Center of Excellence in Bioinformatics & Life Sciences Syntactic relationship discovery process Text processed subsequently by: paragrapher segmenter sentence detection tokenisation rewriting of abbreviations identification of relevant sentences parser

reference resolution resolver relationship discoverer R T U New York State Center of Excellence in Bioinformatics & Life Sciences Text to be processed Sphingosine 1-phosphate induces expression of early growth response-1 and fibroblast growth factor-2 through mechanism involving extracellular signal-regulated kinase in astroglial cells. Sato K, Ishikawa K, Ui M, Okajima F. Laboratory of Signal Transduction, Institute for Molecular and Cellular Regulation, Gunma University, 339-15 Showa-machi, Maebashi, Japan. [email protected] In rat type I astrocytes and C6 glioma cells, sphingosine 1-phosphate (S1P) clearly induced the expression of fibroblast growth factor-2 (FGF-2) mRNA to an extent comparable to that achieved by plateletderived growth factor (PDGF) and endothelin. In C6 cells, Western blotting showed that S1P also induced expression of early growth response-1 (Egr-1), one of the immediate early gene products and an essential transcriptional factor for FGF-2 expression. On the other hand, sphingosine, a substrate for sphingosine kinase which forms intracellular S1P, was a very weak activator for the expression of either FGF-2 or Egr-1. The S1P-induced Egr-1 expression was partially inhibited by treatment of the cells with either calphostin C, an inhibitor of protein kinase C (PKC), or pertussis toxin (PTX), and completely inhibited by the combination of these agents. Essentially, the same inhibitory pattern by these agents has been observed for S1P-induced extracellular signal-regulated kinase (ERK) activation. The S1P-induced expression of Egr-1 was also completely inhibited in association with complete inhibition of ERK by

PD 98059, an ERK kinase inhibitor. Thus, the S1P-induced activation of the Egr-1/FGF-2 system may be mediated through ERK activation, which may involve at least two signaling pathways, i.e., a PTXsensitive G-protein-dependent pathway and a PKC-dependent pathway. PMID: 10640689 [PubMed - indexed for MEDLINE] R T U New York State Paragrapher Center of Excellence in Bioinformatics & Life Sciences output R T U New York State Center of Excellence in Bioinformatics & Life Sciences Segmenter output R T U New York State Center of Excellence in Bioinformatics & Life

Sciences Re-use of resolved abbr. R T U New York State Parser output Center of Excellence in Bioinformatics & Life Sciences R T U New York State Center of Excellence in Bioinformatics & Life Sciences Reference resolution R T U New York State Center of Excellence in

Bioinformatics & Life Sciences Domain-specific CUE-words if (domain.equals("PROTEINS")) subjObjVerbs_ar = new Object[] {"abolish", "abolishes", "abolished", "abolishing", "accompany", "accompanies", "accompanied", "accompanying", "acetylate", "acetylates","acetylated","acetylating", "activate", "activates", "activated", "activating", "affect", "affects", "affected", "affecting", ....} if (domain.equals("PROTEINS")) ofByNouns_ar = new Object[] {"acetylation", "activation", "affection", "aggregation", "altering", "amelioration", "antagonization", "association", "augmentation", "binding", "blocking", "blockage",.... }

R T U New York State Center of Excellence in Bioinformatics & Life Sciences Inter-protein relationship discovery Leptin rapidly inhibits hypothalamic neuropeptide Y secretion and stimulates corticotropin-releasing hormone secretion in adrenalectomized mice . (leptin)-INHIBITS-(hypothalamic neuropeptide Y secretion) (leptin)-INHIBITS-(neuropeptide Y) R T U New York State Center of Excellence in Bioinformatics & Life Sciences ... special patterns These results indicate that oTP-1 may prevent luteolysis by inhibiting development of endometrial responsiveness to oxytocin and , therefore , reduce oxytocin-induced

synthesis of IP3 and PGF2 alpha . (oxytocin)-CAUSES-(synthesis of IP3 and PGF2 alpha) (oxytocin)-CAUSES-(pgf2 alpha) R T U New York State Center of Excellence in Bioinformatics & Life Sciences From syntactic modification to subsumption (adj)-(noun) :: Cadj-noun IS_A Cnoun steroid hormone IS_A hormone fetal liver IS_A liver BUT not: binding factor IS_A factor total protein IS_A protein two domain IS_A domain Usefulness ? relationship with the Cadj

R T U New York State Center of Excellence in Bioinformatics & Life Sciences NLU in the GALEN project R T U New York State Center of Excellence in Bioinformatics & Life Sciences The place of Galen Processor Domain representation Task Knowledge Goal representation Result

L A G Text E N Linguistic Knowledge R T U New York State Center of Excellence in Bioinformatics & Life Sciences The processor at work ... Text

Linguistic Knowledge N E Meaning Representation L Goal Representation A G Task Knowledge Goal representation Result

Processor Domain representation R T U New York State Center of Excellence in Bioinformatics & Life Sciences Some claims by Galen (+) European wide endeavour Result of work by highly competent researchers and developers Clean knowledge kernel of pure medical terminology Totally independent from any source or target system Openess Development not affordable by one single entity R T U New York State Center of Excellence in Bioinformatics & Life Sciences

NLP applications around Galen C-Tex Text Linguistic Knowledge Multi Tale Linguistic Representation Cassandra Galen terminological Knowledge Meaning Representation R T U New York State Center of Excellence in Bioinformatics & Life Sciences

MultiTale: synsem - tagging Dura was incised in linear fashion and the scar around the inlet of the reservoir was dissected out until the ventricular catheter was exposed and withdrawn under direct vision. Dura was incised in linear fashion R T U New York State Center of Excellence in Bioinformatics & Life

Sciences MultiTale-II: Galen-ready linguistic representation valgising osteotomy of humerus ({valgising}5(osteotomy)1{[of]3(humerus)2}4)22 Pre- and postmarker Relationship with the GALEN ontology (exhaustive) link {} () @# \/ criterion descriptor / concept co-ordination not represented in GALEN <>

criterion modifier Relationship with natural language phenomena (examples) explicit in prepositions, or implicit in adjectives adjectives, adverbial constructions nouns, idioms and, or function words such as articles, possessive pronouns, etc. adverbs R T U New York State Center of Excellence in Bioinformatics & Life Sciences Cassandra-II: from LR to CR ({valgising}5(osteotomy)1{[of]3(humerus)2}4)22 ((cutting)21

{[TO_ACHIEVE]6((Deed:valgising)7 {[ACTS_ON]17(Pathology:pathologicalposture)18}19)20}5 {[ACTS_ON]3(Anatomy:humerus)2}4)22 R T U New York State Center of Excellence in Bioinformatics & Life Sciences Linguistic versus Conceptual repr. (1) (excision)35 {[of]111 ((cicatrix)2120 {[of]216 (skin)474}0)0}0 (debridement)82 {[of]142 ({palmar}1785 (skin)474)0}0 RefId 35 82 111 142 216 474 1785 2120 Prototype

excision debridement of of of skin palmar cicatrix Conceptual repr. excising debriding ACTS_ON ACTS_ON HAS_LOCATION skin IS_PART_OF(palm) cicatrix Linguistic repr. excising debriding THEME SOURCE

SOURCE skin LOCATIVE(palm) cicatrix R T U New York State Center of Excellence in Bioinformatics & Life Sciences Linguistic versus Conceptual repr. (2) The Galen view ResourseManagementProcess InstallingProcess LiquidInstallingProcess Filling Injecting The linguistic semantic view To install [ in ]

To fill [with ] To inject [ in ] To inject R T U New York State Center of Excellence in Bioinformatics & Life Sciences Semantic Indexing with and without using ontology R T U New York State Center of Excellence in Bioinformatics & Life Sciences Goals of Semantic Indexing 1. How to identify in a running text those

components that carry meaning ? 2. How to assess how relevant these components are in the context of the entire document ? - aboutness or characterizing power (NLM MetaMap) - topic R T U New York State Center of Excellence in Bioinformatics & Life Sciences Statistics-based systems do not possess explicit domain knowledge, can only identify words or multi-word units in texts, Based on individual document statistics Based on corpus statistics project these on implicitly constructed concepts that are mathematically justifiable, but that do not necessarily correspond with metaphysical reality, are capable in finding those components that qualify as topic markers, are poor in identifying all components.

R T U New York State Center of Excellence in Bioinformatics & Life Sciences Concept-based systems use explicitly defined concepts to which words, terms or phrases are attached as known grammaticalizations in a specific language. attachment may be Lexically realised Grammatically realised Using syntactic grammar and/or semantic grammar tend to identify many components, are less performant in finding the topics. R T U New York State Center of Excellence in : Bioinformatics TeSSI & Life

Sciences Supported Semantic Indexing Terminology Based on LinkBase: formal ontologies dealing with time, mereology, partonomy, ... (Smith, Varzi, Cohn, ...) domain ontology structured according to the way languages are influenced by semantics (Bateman) linking towards multiple 3rd party terminologies, classification systems, ontologies, ... multi-lingual Combines in-document statistics with spreading activation enforcement in LinkBase Implemented as a server R T U New York State Center of Excellence in Bioinformatics & Life Sciences Architectural Overview

LinkBase Database JD Ja BC va Unix Workstation PC LinkFactory Server Mac RMI Corba Soap LAN Concept tree WAN Internet

Server Business Objects Criteria / Full definitions Linktype tree Translate ... TeSSI Server Index LinkFactory Workbench R T U New York State Center of Excellence in Bioinformatics & Life Sciences R T U New York State

Center of Excellence in Bioinformatics & Life Sciences Phrase extraction R T U New York State Center of Excellence in Bioinformatics & Life Sciences Disambiguation R T U New York State Center of Excellence in Bioinformatics & Life Sciences Coding R T U New York State

Center of Excellence in Bioinformatics & Life Sciences R T U New York State Center of Excellence in Bioinformatics & Life Sciences R T U New York State Center of Excellence in Bioinformatics & Life Sciences Intermediate conclusions Good results (showed by means of recall/precision studies based on OHSUMED) BUT: important effort in building an appropriate ontology (we can live with that because we did it already for

healthcare) Is there a risk that ever this effort would lose its value ? R T U New York State Center of Excellence in Bioinformatics & Life Sciences 22 page full paper A statistics only system ABSTRACT ONLY R T U New York State Center of Excellence in Bioinformatics & Life Sciences

How far can these systems go ? Some positive characteristics: Do not require detailed domain knowledge Are language independent Are able to find complex multi-word units Some negative (?) characteristics: seem to be dependent from document length unclear how to link to existing terminologies (find words instead of concepts) R T U New York State Center of Excellence in Bioinformatics & Life Sciences To find this out Select from OHSUMED 29 abstracts with stated high relevance for 5 concepts, hence supposed to cover the same topic; Sort abstracts in ascending order with respect to document length;

Concatenate documents to get even larger documents; Perform a forecast analysis; Compare TeSSI with statistics based system. R T U New York State Center of Excellence in Bioinformatics & Life Sciences Word, concept and node identification per document (real) Count of words, concepts or nodes 10000 1000 W ords 100 Nodes

Concepts 10 1 1 2 3 4 5 6 Document numbe r 7 8 9

R T U New York State Center of Excellence in Bioinformatics & Life Sciences Absolute Concept/Node identification (real) 1800 1600 Nr of nodes or concepts 1400 1200 1000 800 600 400 200 0 0 500

1000 1500 2000 2500 3000 Word Count 3500 4000 4500 5000 R T U New York State Center of Excellence in

Bioinformatics & Life Sciences Relative Concept/Node identification (real) 0,4 concepts 0,35 0,3 0,25 0,2 0,15 0,1 0,05 nodes 0 0 500 1000

1500 2000 2500 Nr of w ords 3000 3500 4000 4500 5000 R T U New York State Center of Excellence in Bioinformatics & Life Sciences

Concept/Node identification % (forecast) 0,4 0,35 concepts 0,3 0,25 0,2 0,15 0,1 0,05 nodes 0 0 20.000.000 40.000.000 60.000.000 Nr of w ords

80.000.000 100.000.000 120.000.000 R T U New York State Center of Excellence in Bioinformatics & Life Sciences Conclusions The ontological approach that accepts language as a medium of communication, provides a very good basis for NLU if associative relationships are prominently present. Hierarchies are not enough In-document (and even corpus) statistics provide additional information but have an upper bound if used without domain information. Detail and explicitness at the level of concept and

relationships determine indexing performance R T U New York State Center of Excellence in Bioinformatics & Life Sciences The End

Recently Viewed Presentations

  • WRITINGTECH IT UP A NOTCH DIGITAL STORYTELLING Presented

    WRITINGTECH IT UP A NOTCH DIGITAL STORYTELLING Presented

    High-interest. 21st century learning. Meets Writing & Arts and Humanities Program Review rubrics for all levels. Easy to share with parents and global community
  • The U.S. Fish &amp; Wildlife Service America&#x27;s New Energy ...

    The U.S. Fish & Wildlife Service America's New Energy ...

    First adopted in 1918 to implement treaty with Great Britain/Canada and later Mexico aimed at curbing trade in feathers. Over 800 bird species listed ... - RWPestablishes habitat focal and connectivity areas in different ecoregions, requires payment of enrollment/mitigation fee....
  • Vehicle Requirements - Drive Smart Teen and Adult Driving School

    Vehicle Requirements - Drive Smart Teen and Adult Driving School

    The Highway Transportation System (HTS) A highway is a main road for travel by the public between important destinations, such as cities, large towns, ... and driving safety concluded that, after controlling for driving difficulty and time on task, drivers...
  • A Blueprint for Green Printing This event is

    A Blueprint for Green Printing This event is

    Concierge Couriers Ltd provide Same Day Courier Service by Motorcycle throughout the UK and to Europe. Bronze. A Blueprint for Green Printing. ... Aran Services, Bury St Edmunds, were awarded . Silver. Ceetech Ltd, Great Blakenham, were awarded . Bronze....
  • The Medical Management of Acute Agitation APM Resident

    The Medical Management of Acute Agitation APM Resident

    Riker RR, Fraser GL. The new practice guidelines for pain, agitation, and delirium. Am J Critical Care. 22(2):153-7, 2013. Academy of Consultation-Liaison Psychiatry. Special Population: Weaning of Ventilation. Dexmedetomidine (alpha 2 adrenergic sedative)
  • How can we help our children to become Ready for School?

    How can we help our children to become Ready for School?

    Teaching songs and nursery rhymes ... Early Years Foundation Stage is a very important stage as it helps your child get ready for school as well as preparing them for their future learning and successes. From when your child is...
  • Volitional Arguments for Religious Belief: William James ...

    Volitional Arguments for Religious Belief: William James ...

    If God exists William James: Will to Believe Belief in God does not and should not depend on dispassionate reason Knowing God Without Arguments Alvin Plantinga: Belief in God is a foundational belief on which other beliefs are based (or...
  • Figurative Language - Ms. Haarer&#x27;s Reading Nook

    Figurative Language - Ms. Haarer's Reading Nook

    Identifying figurative and descriptive language will help me figure out what is meant by the text. 4 - use figurative and descriptive language to write a story or poem. 3 - identify figurative and descriptive language in multiple texts. 2...