Improving precision and recall in study retrieval A concept for thesaurus-based syntactic indexing Pascal Siegers and Tanja Friedrich, Data Archive for the Social Sciences at GESIS Whats inside? 1. Subject indexing at GESIS 2. Ambiguity treatment in
indexing 3. A concept for syntactic indexing 4. Conclusion and outlook Whats inside? 1. Subject indexing at GESIS 2. Ambiguity treatment in
indexing 3. A concept for syntactic indexing 4. Conclusion and outlook The GESIS data catalogue Archived studies are documented for retrieval and access
(download/delivery) Contains detailed study descriptions for approx. 5.500 studies Subject indexing at GESIS Currently, GESIS does not use a thesaurus for study indexing
Free keywording on variable level is employed Subject indexing at GESIS Currently, GESIS does not use a thesaurus for study indexing Free keywording on
variable level is employed Subject indexing at GESIS Currently, GESIS does not use a thesaurus for study indexing Free keywording on variable level is
employed Satisfaction with life (happiness) Subject indexing at GESIS Currently, GESIS does not use a thesaurus for study indexing Free keywording on
variable level is employed Satisfaction with life (happiness) Subject indexing at GESIS Currently, GESIS does not use a thesaurus for study indexing
Free keywording on variable level is employed Satisfaction with life (happiness) government should provide only basic health care services Subject indexing at GESIS The good:
Subject indexing at GESIS The good: Indexing according to users needs: question or variable level indexing allows retrieval of constructs for secondary analysis
Subject indexing at GESIS The good: Indexing according to users needs: question or variable level indexing allows retrieval of constructs for secondary analysis
The bad: Subject indexing at GESIS The good: The bad: Indexing according to users needs: question or
variable level indexing allows retrieval of constructs for secondary analysis No controlled vocabulary (thesaurus): no control of semantic ambiguity in retrieval
Whats inside? 1. Subject indexing at GESIS 2. Ambiguity treatment in indexing 3. A concept for syntactic indexing 4. Conclusion and outlook
Examples for semantic ambiguity Problem with synonyms: Problem with homonyms: Users search for guest or visitor enterprise or company organic farming or biological farming
Users will obtain not all relevant items Users want to find one of association (political, legal) or association (psychological) content (adjective) or content (noun) Users will obtain irrelevant items
Results of semantic ambiguity False associations and a tendency towards low recall in retrieval Solution: Employ a thesaurus To tackle the semantic ambiguity while retaining specifity (in-depth indexing on question or variable level) But be careful not to gain syntactic
ambiguity Example for syntactic ambiguity Lancaster 1998, Indexing and Abstracting in Theory and Practice, 2nd Ed. London, Library Association Publishing, p. 7. Example for syntactic ambiguity Nation-wide public opinion survey of U.S. attitudes on
the Middle East Lancaster 1998, Indexing and Abstracting in Theory and Practice, 2nd Ed. London, Library Association Publishing, p. 7. Example for syntactic ambiguity Nation-wide public opinion survey of U.S. attitudes on the Middle East
Lancaster 1998, Indexing and Abstracting in Theory and Practice, 2nd Ed. London, Library Association Publishing, p. 7. PUBLIC OPINION TELEPHONE SURVEYS UNITED STATES ATTITUDES MIDDLE EAST Example for syntactic ambiguity
Attitudes towards Middle East in the United States? Lancaster 1998, Indexing and Abstracting in Theory and Practice, 2nd Ed. London, Library Association Publishing, p. 7. Example for syntactic ambiguity Attitudes towards Middle East in the
United States? OR Lancaster 1998, Indexing and Abstracting in Theory and Practice, 2nd Ed. London, Library Association Publishing, p. 7. Example for syntactic ambiguity Attitudes towards Middle East in the United States?
OR Attitudes towards United States in the Middle East? Lancaster 1998, Indexing and Abstracting in Theory and Practice, 2nd Ed. London, Library Association Publishing, p. 7. Results of syntactic ambiguity False associations and a tendency towards low precision in retrieval
Summing-up ambiguity treatment Summing-up ambiguity treatment The good: Summing-up ambiguity treatment The good: Use of a thesaurus
reduces semantic ambiguity and improves recall Summing-up ambiguity treatment The good: Use of a thesaurus reduces semantic ambiguity and improves
recall The bad: Summing-up ambiguity treatment The good: Use of a thesaurus reduces semantic ambiguity and improves
recall The bad: Abandoning the free keywording increases syntactic ambiguity and lowers precision Solution: Employ a syntax
Tackle the bad in the present indexing: employ a thesaurus Take the good in the present indexing: indepth indexing on question or variable level Thesaurus-based syntactic indexing Solution: Employ a syntax Our syntax will be composed of Term linking and Role operators
Are both common syntactic rules in indexing Whats inside? 1. Subject indexing at GESIS 2. Ambiguity treatment in indexing 3. A concept for
syntactic indexing 4. Conclusion and outlook Syntax Term linking Term linking Not just one flat term list
Term linking Not just one flat term list Term linking Not just one flat term list Term linking
Not just one flat term list PUBLIC OPINION TELEPHONE SURVEYS UNITED STATES ATTITUDES MIDDLE EAST ISRAEL EGYPT ARAB NATIONS PALESTINE LIBERATION ORGANISATION
PEACE CONFERENCES PEACE PALESTINIAN STATE FOREIGN AID POLITICAL LEADERS Term linking Terms that refer to one question or variable are grouped/ linked, like PALESTINIAN STATE; ATTITUDES
MIDDLE EAST; PEACE; ESTIMATION Syntax Role operators Role operators Terms are classified as directive terms and subject terms Role operators
Terms are classified as directive terms and subject terms Allows to identify measurable constructs, like attitudes towards an independent Palestinian state estimation of peace in the Middle East Role operators Social Science Construct
Contents/topics (subject) Attributes (direction): any subject area relevant in social science E.g. work, family religion, education
Cognition Evaluation Affection Action [objective characteristics]
Measurable unit Subject and directive terms Subject and directive terms Subject terms Specify the contents of the measurement As specific as
possible Combinations of terms, if necessary Subject and directive terms Subject terms Specify the contents of the measurement As specific as possible
Combinations of terms, if necessary Directive terms Specify the attributes of the measurement Limited heterogeneity in directive terms to facilitate faceted retrieval
Examples for directive terms Cognition
PERCEPTION KNOLEDGE AWARENESS INTEREST BELIEF ORIENTATION Evaluation
ATTITUDE PREFERENCE JUDGMENT PREJUDICE
SATISFACTION ACCEPTANCE/APPROVAL REJECTION/REFUSAL Affection
MOOD FEAR ANGER/ANNOYANCE HAPPINESS HATE LOVE Action
BEHAVIOR USE/UTILIZATION
CHOICE EXPERIENCE INTERACTION ACTIVITY CONSTRUCTION/DESTRUCTION The complete syntax The complete syntax Measurable Unit (e.g. survey
question) The complete syntax Measurable Unit (e.g. survey question) subject term(s) (ST) directive terms (DT) The complete syntax
Measurable Unit (e.g. survey question) subject term(s) (ST) directive terms (DT) Precoordination/syntactic indexing = linked terms that are specified by role operators Examples: corruption
There is corruption in the in the national public institutions in Germany. (Eurobarometer 76.1; ZA5565) Directive term: PERCEPTION Subject term(s): CORRUPTION, PUBLIC INSTITUTIONS Syntactic Indexing: PERCEPTION; CORRUPTION; PUBLIC INSTITUTIONS Are you personally affected by corruption in your daily activities? (Eurobarometer 76.1; ZA5565) Directive term: EXPERIENCE Subject term(s): CORRUPTION
Syntactic Indexing: EXPERIENCE; CORRUPTION Whats inside? 1. Subject indexing at GESIS 2. Ambiguity treatment in indexing 3. A concept for syntactic indexing 4. Conclusion and
outlook Conclusion Thesaurus-based syntactic indexing helps us to reduce semantic ambiguity while we retain our level of specifity and depth in indexing Outlook Our concept improves
our study descriptions and enables new retrieval techniques Outlook Syntactic indexing can enhance faceted retrieval Refine search results by subject or directive
terms Refine search / narrow results Refine by topic of study International politics (19) Conflict, security and peace (18) Society, culture (10) Refine by questions Refine by subject
Middle East (20) Conflict (19) Israel (19) Peace (19) Palestinian State (19) USA (18) Egypt (17) Refine by intention Attitude (15) Behaviour (12)
Knowledge (9) Refine by country USA (15) Israel (10) France (8) Australia (3) Refine by time period 2003 (10) 2012 (5) 2008 (5)
2010 (3) Thank you for your attention! Dr. Pascal Siegers Tanja Friedrich [email protected]
[email protected]