Folie 1 - York University Libraries

Folie 1 - York University Libraries

Improving precision and recall in study retrieval A concept for thesaurus-based syntactic indexing Pascal Siegers and Tanja Friedrich, Data Archive for the Social Sciences at GESIS Whats inside? 1. Subject indexing at GESIS 2. Ambiguity treatment in

indexing 3. A concept for syntactic indexing 4. Conclusion and outlook Whats inside? 1. Subject indexing at GESIS 2. Ambiguity treatment in

indexing 3. A concept for syntactic indexing 4. Conclusion and outlook The GESIS data catalogue Archived studies are documented for retrieval and access

(download/delivery) Contains detailed study descriptions for approx. 5.500 studies Subject indexing at GESIS Currently, GESIS does not use a thesaurus for study indexing

Free keywording on variable level is employed Subject indexing at GESIS Currently, GESIS does not use a thesaurus for study indexing Free keywording on

variable level is employed Subject indexing at GESIS Currently, GESIS does not use a thesaurus for study indexing Free keywording on variable level is

employed Satisfaction with life (happiness) Subject indexing at GESIS Currently, GESIS does not use a thesaurus for study indexing Free keywording on

variable level is employed Satisfaction with life (happiness) Subject indexing at GESIS Currently, GESIS does not use a thesaurus for study indexing

Free keywording on variable level is employed Satisfaction with life (happiness) government should provide only basic health care services Subject indexing at GESIS The good:

Subject indexing at GESIS The good: Indexing according to users needs: question or variable level indexing allows retrieval of constructs for secondary analysis

Subject indexing at GESIS The good: Indexing according to users needs: question or variable level indexing allows retrieval of constructs for secondary analysis

The bad: Subject indexing at GESIS The good: The bad: Indexing according to users needs: question or

variable level indexing allows retrieval of constructs for secondary analysis No controlled vocabulary (thesaurus): no control of semantic ambiguity in retrieval

Whats inside? 1. Subject indexing at GESIS 2. Ambiguity treatment in indexing 3. A concept for syntactic indexing 4. Conclusion and outlook

Examples for semantic ambiguity Problem with synonyms: Problem with homonyms: Users search for guest or visitor enterprise or company organic farming or biological farming

Users will obtain not all relevant items Users want to find one of association (political, legal) or association (psychological) content (adjective) or content (noun) Users will obtain irrelevant items

Results of semantic ambiguity False associations and a tendency towards low recall in retrieval Solution: Employ a thesaurus To tackle the semantic ambiguity while retaining specifity (in-depth indexing on question or variable level) But be careful not to gain syntactic

ambiguity Example for syntactic ambiguity Lancaster 1998, Indexing and Abstracting in Theory and Practice, 2nd Ed. London, Library Association Publishing, p. 7. Example for syntactic ambiguity Nation-wide public opinion survey of U.S. attitudes on

the Middle East Lancaster 1998, Indexing and Abstracting in Theory and Practice, 2nd Ed. London, Library Association Publishing, p. 7. Example for syntactic ambiguity Nation-wide public opinion survey of U.S. attitudes on the Middle East

Lancaster 1998, Indexing and Abstracting in Theory and Practice, 2nd Ed. London, Library Association Publishing, p. 7. PUBLIC OPINION TELEPHONE SURVEYS UNITED STATES ATTITUDES MIDDLE EAST Example for syntactic ambiguity

Attitudes towards Middle East in the United States? Lancaster 1998, Indexing and Abstracting in Theory and Practice, 2nd Ed. London, Library Association Publishing, p. 7. Example for syntactic ambiguity Attitudes towards Middle East in the

United States? OR Lancaster 1998, Indexing and Abstracting in Theory and Practice, 2nd Ed. London, Library Association Publishing, p. 7. Example for syntactic ambiguity Attitudes towards Middle East in the United States?

OR Attitudes towards United States in the Middle East? Lancaster 1998, Indexing and Abstracting in Theory and Practice, 2nd Ed. London, Library Association Publishing, p. 7. Results of syntactic ambiguity False associations and a tendency towards low precision in retrieval

Summing-up ambiguity treatment Summing-up ambiguity treatment The good: Summing-up ambiguity treatment The good: Use of a thesaurus

reduces semantic ambiguity and improves recall Summing-up ambiguity treatment The good: Use of a thesaurus reduces semantic ambiguity and improves

recall The bad: Summing-up ambiguity treatment The good: Use of a thesaurus reduces semantic ambiguity and improves

recall The bad: Abandoning the free keywording increases syntactic ambiguity and lowers precision Solution: Employ a syntax

Tackle the bad in the present indexing: employ a thesaurus Take the good in the present indexing: indepth indexing on question or variable level Thesaurus-based syntactic indexing Solution: Employ a syntax Our syntax will be composed of Term linking and Role operators

Are both common syntactic rules in indexing Whats inside? 1. Subject indexing at GESIS 2. Ambiguity treatment in indexing 3. A concept for

syntactic indexing 4. Conclusion and outlook Syntax Term linking Term linking Not just one flat term list

Term linking Not just one flat term list Term linking Not just one flat term list Term linking

Not just one flat term list PUBLIC OPINION TELEPHONE SURVEYS UNITED STATES ATTITUDES MIDDLE EAST ISRAEL EGYPT ARAB NATIONS PALESTINE LIBERATION ORGANISATION

PEACE CONFERENCES PEACE PALESTINIAN STATE FOREIGN AID POLITICAL LEADERS Term linking Terms that refer to one question or variable are grouped/ linked, like PALESTINIAN STATE; ATTITUDES

MIDDLE EAST; PEACE; ESTIMATION Syntax Role operators Role operators Terms are classified as directive terms and subject terms Role operators

Terms are classified as directive terms and subject terms Allows to identify measurable constructs, like attitudes towards an independent Palestinian state estimation of peace in the Middle East Role operators Social Science Construct

Contents/topics (subject) Attributes (direction): any subject area relevant in social science E.g. work, family religion, education

Cognition Evaluation Affection Action [objective characteristics]

Measurable unit Subject and directive terms Subject and directive terms Subject terms Specify the contents of the measurement As specific as

possible Combinations of terms, if necessary Subject and directive terms Subject terms Specify the contents of the measurement As specific as possible

Combinations of terms, if necessary Directive terms Specify the attributes of the measurement Limited heterogeneity in directive terms to facilitate faceted retrieval

Examples for directive terms Cognition

PERCEPTION KNOLEDGE AWARENESS INTEREST BELIEF ORIENTATION Evaluation

ATTITUDE PREFERENCE JUDGMENT PREJUDICE

SATISFACTION ACCEPTANCE/APPROVAL REJECTION/REFUSAL Affection

MOOD FEAR ANGER/ANNOYANCE HAPPINESS HATE LOVE Action

BEHAVIOR USE/UTILIZATION

CHOICE EXPERIENCE INTERACTION ACTIVITY CONSTRUCTION/DESTRUCTION The complete syntax The complete syntax Measurable Unit (e.g. survey

question) The complete syntax Measurable Unit (e.g. survey question) subject term(s) (ST) directive terms (DT) The complete syntax

Measurable Unit (e.g. survey question) subject term(s) (ST) directive terms (DT) Precoordination/syntactic indexing = linked terms that are specified by role operators Examples: corruption

There is corruption in the in the national public institutions in Germany. (Eurobarometer 76.1; ZA5565) Directive term: PERCEPTION Subject term(s): CORRUPTION, PUBLIC INSTITUTIONS Syntactic Indexing: PERCEPTION; CORRUPTION; PUBLIC INSTITUTIONS Are you personally affected by corruption in your daily activities? (Eurobarometer 76.1; ZA5565) Directive term: EXPERIENCE Subject term(s): CORRUPTION

Syntactic Indexing: EXPERIENCE; CORRUPTION Whats inside? 1. Subject indexing at GESIS 2. Ambiguity treatment in indexing 3. A concept for syntactic indexing 4. Conclusion and

outlook Conclusion Thesaurus-based syntactic indexing helps us to reduce semantic ambiguity while we retain our level of specifity and depth in indexing Outlook Our concept improves

our study descriptions and enables new retrieval techniques Outlook Syntactic indexing can enhance faceted retrieval Refine search results by subject or directive

terms Refine search / narrow results Refine by topic of study International politics (19) Conflict, security and peace (18) Society, culture (10) Refine by questions Refine by subject

Middle East (20) Conflict (19) Israel (19) Peace (19) Palestinian State (19) USA (18) Egypt (17) Refine by intention Attitude (15) Behaviour (12)

Knowledge (9) Refine by country USA (15) Israel (10) France (8) Australia (3) Refine by time period 2003 (10) 2012 (5) 2008 (5)

2010 (3) Thank you for your attention! Dr. Pascal Siegers Tanja Friedrich [email protected]

[email protected]

Recently Viewed Presentations

  • JPEG Image Compression All of the Gory Details

    JPEG Image Compression All of the Gory Details

    JPEG Image Compression All of the Gory Details Biological Imaging Ross Whitaker University of Utah Clinical Imaging Paradigm Diagnosis/prognosis (individuals) Radiologists/experts Trends More/larger datasets (+3D) Surgery/planning Quantification Biological Imaging Understanding organisms/populations Growth-explosive Trends: instrumentation + science Very large datasets E.g....
  • CHAPTER 10 Integrated Marketing Communications  2014 Cengage Learning.

    CHAPTER 10 Integrated Marketing Communications 2014 Cengage Learning.

    Integrated Marketing Communications (IMC) The strategic, coordinated use of promotion to create one consistent message across multiple channels to ensure maximum persuasive impact on the firm's current and potential customers. Takes a 360-degree view of the customer. The Importance of...
  • EXL327: Real-World Site Resilience Design in Microsoft ...

    EXL327: Real-World Site Resilience Design in Microsoft ...

    That's right folks, silent redirect for OWA clients! FINALLY!!! There are also some deployment architectures that would avoid the redirection altogether, but they require at least one more site, more Client Access servers, and more hardware load balancers
  • 8th Grade - Epiphany Catholic School

    8th Grade - Epiphany Catholic School

    If the number of protons in 2 atoms is different then you have different elements. ex. 4 protons 5 protons. beryllium boron. If you have same number of protons in two substances but different neutrons, then you have the same...
  • Fragments of a Large Lapita Jar. c. 1200-1100 BCE. Height of ...

    Fragments of a Large Lapita Jar. c. 1200-1100 BCE. Height of ...

    Fragments of a Large Lapita Jar. c. 1200-1100 BCE. Height of human face motif approx. 1 1/2". Pacific Cultural-Geographic Regions. Jimmy Midjaw Midjaw.
  • TASFAA 2018 Honoring the Past & Building the

    TASFAA 2018 Honoring the Past & Building the

    Homeless liaisons will continue to make efforts to determine and re-verify UHY as long as they have the information necessary to make such determination. The National Center for Homeless Education created an optional . ... PowerPoint Presentation Last modified by:
  • Weatherization Assistance Program WPN 17-7 Health & Safety

    Weatherization Assistance Program WPN 17-7 Health & Safety

    The deferral/referral policy is a separate piece that covers a broader range of issues than just H&S issues, and is part of the master file. H&S Plan must still identify specific action levels related to certain issues that trigger deferral,...
  • "First Love" from Silent Dancing by Judith Ortiz Cofer

    "First Love" from Silent Dancing by Judith Ortiz Cofer

    Synonyms (similar words) Terrible, awful, dreadful Listening/writing exercise 1. Pairs will present their vocabulary words, definitions, sentences. ... Memoir - "First Love" Ironic Something is ironic if the result is the opposite of what was intended Example: A person who...