OCLC Research Update: ALA Annual 2015

ALA Annual 2015 OCLC Research Update Merrilee Proffitt, Senior Program Officer Bruce Washburn, Consulting Software Engineer Diane Vizine-Goetz, Senior Research Scientist Roy Tennant, MC and Senior Program Officer Merrilee Proffitt Wikipedia and Libraries Bruce Washburn On the Linked Data Learning Curve Diane Vizine-Goetz

FAST (Faceted Application of Subject Terminology) ALA Annual 2015 OCLC Research in Brief Roy Tennant Senior Program Officer OCLC Research Explores challenges facing libraries and archives in a rapidly changing information technology environment Three primary modes of activity: Community research & development Advanced development (data mining, prototyping) Member/Partner engagement

OCLC Research Library Partnership Work is openly available, e.g., Reports Experimental services THEMES New Report: Stewardship of the Evolving Scholarly Record The scholarly record is evolving, so stewardship models for scholarly record are changing too Conscious coordination

key to securing future of scholarly record oc.lc/esr-stewardship Library Linked Data in the Cloud Just published Offers insights gained from OCLCs innovative work with linked data Main sections:

Library Standards and the Semantic Web Modeling Library Authority Files Modeling and Discovering Creative Works Entity Identification Through Text Mining The Library Linked Data Cloud Technical but approachable Anyone with a modest background in metadata can read & understand it ALA Annual 2015 Wikipedia & Libraries Increasing Library Visibility

Merrilee Proffitt Senior Program Officer, OCLC Research SM Discovery happens elsewhere Lorcan Dempsey, OCLC Research Why Wikipedia? 35 million articles 286 languages 2 billion edits (11 million / month) 8000 views per second 500 million monthly visitors

5th most popular website 2000x larger than Brittanica Why Wikipedia? Starting point for research Learning black market and GWR Google > Wikipedia > References Ideologically aligned with library mission Access to knowledge for free Shared appreciation of quality sources Shared appreciation of authority control Wikipedia + Libraries Wikipedia + Libraries

Wikipedia + Libraries Wikipedia + Libraries How to engage? Learn to edit Wikipedia Attend or host an editing event Host a Wikipedian in Residence

Host a Wikipedia Visiting Scholar Consider the unique value that libraries and librarians can bring to Wikipedia For more information / inspiration Wiki Libraries GLAM Wiki Wikipedia Library WikiEdu (Wikipedia Education Foundation) OCLC Research Update, June 29, 2015 On the Linked Data Learning Curve Current work in OCLC Research Bruce Washburn

Consulting Software Engineer The Knowledge Graph A Google blog post from 2012 describes the Knowledge Graph that supports searching for the things, people and places that Google knows about and suggestions for relevant related things. The Graph powers the Google Knowledge Panel in search results

The Google Knowledge Vault A series of recent Google Research papers describe the use of probabilistic models and machine learning to assess the truth of statements made by multiple sources. Li, X., Dong, X. L., Lyons, K., Meng, W., Srivastava, D. (2013). Truth Finding on the Deep Web: Is the Problem Solved? Dong, X. L., Gabrilovich, E., Heitz, G., Horn, W., Murphy, K., Sun, S., Zhang, W. (2013). From Data Fusion to Knowledge Fusion. Dong, X. L., Murphy, K., Gabrilovich, E., Heitz, G., Horn, W., Lao, N., ... & Zhang, W. (2014). Knowledge Vault: A Web-scale approach to probabilistic knowledge fusion

Dong, X. L., Gabrilovich, E., Murphy, K. Dang, V., Horn, W., & Zhang, W. (2015). Knowledge-Based Trust: Estimating the Trustworthiness of Web Sources A Knowledge Vault for Libraries? OCLC research scientists and software engineers are prototyping a similar model for bibliographic and authority data sources, in combination with user-contributed content and Linked Data from other providers, to evaluate a knowledge vault for statements about entities and their relationships, including people, groups, places, events, concepts, and works. Library data sources

WorldCat thousands of libraries, museums and archives contribute to the aggregation. OCLC adds FRBR clustering, algorithmically-deduced connections of strings to Linked Data identifiers, and new entities. VIAF 30 or more authority systems contribute, and OCLC merges and links records into new VIAF clusters. FAST OCLC transforms Library of Congress subject headings into a new controlled vocabulary, friendly to faceted navigation. Knowledge Vault data flow Enhanced WorldCat

Extractor VIAF Extractor FAST Data Sources Knowledge Triples

Fusers Collective Fusion Scored Triples Extractor Extraction Knowledge Vault

The EntityJS Research Project Get some real-life RDF experience, test entity refinement and editing, and push triples back to the knowledge vault. Testing with a subset of Knowledge Just the ArchiveGrid WorldCat MARC records WorldCat VIAF FAST ArchiveGrid

EntityJS Extractor Knowledge Triples Fusers Collective Fusion Scored

Triples Extractor Extractors DBPedia Vault Services Wikidata Application Triples

Knowledge Vault Search across entities Show related entities Show related entities Get more relationships from Wikidata EntityJS users can identify matching entities EntityJS users can identify matching entities

As the EntityJS research project continues We will explore other ways to edit and refine entity data and experiment with using the knowledge vault to support data visualizations Keeping up with EntityJS Track initial progress reports on the OCLC Research blog at hangingtogether.org. Well provide project details on the OCLC Research website soon. Questions or comments? Bruce Washburn ([email protected]) or Jeff Mixter ([email protected])

at OCLC Research OCLC Research Update, June 29, 2015 FAST (Faceted Application of Subject Terminology) Diane Vizine-Goetz Senior Research Scientist Basics Enumerative, faceted subject heading vocabulary Derived primarily from Library of Congress Subject Headings (LCSH), LC NACO file, and LC Genre/Form terms Retains the vocabulary and reference structure of the source files Eight categories of terms

Persons Organizations Events Titles of works Chronological/Time periods Topics

Places Form/Genre terms 35 Why FAST? Developed to meet a need for a general subject vocabulary that is easy to learn, apply, and control Modern design All headings established in authority file Persistent identifiers for all headings Obsolete headings deprecated not deleted Authority file structure facilitates application and automated maintenance of headings Faceted-navigation friendly

Responsible parties Began as a collaboration of OCLC Research and the Library of Congress OCLC Research & advisory groups WorldCat quality management team FAST users (e.g., Cornell University, Australian Policy Online, etc.) ALCTS Faceted Subject Access Interest Group Facet Counts 5 June 2015 Facet Count

Persons Organizations Events Titles of Works Chronological/Time periods Topics Places Form/Genre Total 692,734 360,571 12,417 63,074 676

406,873 176,774 2,507 1,715,626 38 FAST in MARC Bibliographic Records Headings before conversion FAST headings after conversion 600 10$a Lacks, Henrietta, $d 1920-1951 $x Health. 650 #0$a Cancer $x Patients $z Virginia $v Biography.

650 #0$a African American women $x History. 650 #0$a Human experimentation in medicine $z United States $x History. 650 #0$a HeLa cells. 650 #0$a Cancer $x Research. 650 #0$a Cell culture. 650 #0$a Medical ethics. 600 17 $a Lacks, Henrietta, $d 1920-1951 $2 fast $0 (OCoLC)fst01914767 650 #7 $a African American women $2 fast $0 (OCoLC)fst00799438 650 #7 $a Cancer $x Patients $2 fast $0 (OCoLC)fst00845411 650 #7 $a Cancer $x Research $2 fast $0 (OCoLC)fst00845497 650 #7 $a Cell culture $2 fast $0 (OCoLC)fst00850172 650 #7 $a Health $2 fast $0 (OCoLC)fst00952743 650 #7 $a HeLa cells $2 fast $0 (OCoLC)fst00952578

650 #7 $a Human experimentation in medicine $2 fast $0 (OCoLC)fst00963042 650 #7 $a Medical ethics $2 fast $0 (OCoLC)fst01014081 651 #7 $a United States $2 fast $0 (OCoLC)fst01204155 651 #7 $a Virginia $2 fast $0 (OCoLC)fst01204597 655 #7 $a Biography $2 fast $0 (OCoLC)fst01423686 655 #7 $a History $2 fast $0 (OCoLC)fst01411628 39 FAST in MARC Bibliographic Records Headings before conversion FAST headings after conversion

600 10$a Lacks, Henrietta, $d 1920-1951 $x Health. 650 #0$a Cancer $x Patients $z Virginia $v Biography. 650 #0$a African American women $x History. 650 #0$a Human experimentation in medicine $z United States $x History. 650 #0$a HeLa cells. 650 #0$a Cancer $x Research. 650 #0$a Cell culture. 650 #0$a Medical ethics. 600 17 $a Lacks, Henrietta, $d 1920-1951 $2 fast $0 (OCoLC)fst01914767 650 #7 $a African American women $2 fast $0 (OCoLC)fst00799438 650 #7 $a Cancer $x Patients $2 fast $0 (OCoLC)fst00845411 650 #7 $a Cancer $x Research $2 fast $0 (OCoLC)fst00845497

650 #7 $a Cell culture $2 fast $0 (OCoLC)fst00850172 650 #7 $a Health $2 fast $0 (OCoLC)fst00952743 650 #7 $a HeLa cells $2 fast $0 (OCoLC)fst00952578 650 #7 $a Human experimentation in medicine $2 fast $0 (OCoLC)fst00963042 650 #7 $a Medical ethics $2 fast $0 (OCoLC)fst01014081 651 #7 $a United States $2 fast $0 (OCoLC)fst01204155 651 #7 $a Virginia $2 fast $0 (OCoLC)fst01204597 655 #7 $a Biography $2 fast $0 (OCoLC)fst01423686 655 #7 $a History $2 fast $0 (OCoLC)fst01411628 http://experimental.worldcat.org/fast/963042/ 40 FAST in MARC Bibliographic

Records Facet Person Topic.. Place Form/Genre FAST headings after conversion 600 17 $a Lacks, Henrietta, $d 1920-1951 650 #7 $a African American women 650 #7 $a Cancer $x Patients 650 #7 $a Cancer $x Research 650 #7 $a Cell culture 650 #7 $a Health

650 #7 $a HeLa cells 650 #7 $a Human experimentation in medicine 650 #7 $a Medical ethics 651 #7 $a United States 651 #7 $a Virginia 655 #7 $a Biography 655 #7 $a History 41 FAST and Authority Files FAST Cancer--Patients Cancer--Patients--Atti

tudes Cancer--Patients--Biography[obsolete] Cancer--Patients--Care Cancer--Patients--Conduct of life Cancer--Patients--Counseling of Cancer--Patients--Dental care Cancer--Patients--Economic conditions Cancer--Patients--Education Cancer--Patients--Employment Cancer--Patients--Family relationships Facet topic topic topic

topic topic topic topic topic topic topic topic Cancer--Patients--Home care Cancer--Patients--Home care--Planning Cancer--Patients--Hospital care Cancer--Patients--Hospital care--Planning Cancer--Patients--Legal status, laws, etc. Cancer--Patients--Long-term care

Cancer--Patients--Long-term care--History[obsolete] Cancer--Patients--Medical care Cancer--Patients--Mental health Cancer--Patients--Mental health services Cancer--Patients--Nutrition Cancer--Patients--Pastoral counseling of Cancer--Patients--Psychological aspects Cancer--Patients--Psychology Cancer--Patients--Rehabilitation Cancer--Patients--Rehabilitation--Societies, etc. Cancer--Patients--Religious life Cancer--Patients--Research Cancer--Patients--Services for Cancer--Patients--Sexual behavior Cancer--Patients--Social conditions

Cancer--Patients--Social networks Cancer--Patients--Treatment topic topic topic topic topic topic topic topic topic topic topic topic

topic topic topic topic topic topic topic topic topic topic topic WC usage LCSH 13,564 Cancer x

155 Cancer x 552 24 85 24 40 Cancer x 45 41 1,274 Cancer x Cancer x 252 Cancer x 4 132 Cancer x 5

24 56 Cancer x Patients [150] Patients v Biography [150] Patients x Economic conditions [150] Patients x Family relationships [150] Patients v Fiction [150] Patients x Home care [150] Patients x Hospital care [150] Patients x Long-term care [150]

105 110 5 76 39 129 394 799 Cancer x Patients 5 255 Cancer x Patients 19 608 49 105 Cancer x Patients 35

55 Cancer x Patients x Rehabilitation [150] x Religious life [150] x Social conditions [150 z United States v Biography 42 Tools for Application and Use assignFAST service that automates the manual selection of FAST Subjects based on

autosuggest technology searchFAST search interface to the FAST authority file that simplifies the process of heading selection FASTConverter web application that converts LCSH headings to FAST headings; it helps users become familiar with FAST and see the differences between LCSH and FAST FAST Linked Data API Linked Data descriptions expressed using SKOS (Simple Knowledge Organization System) and Schema.org WorldShare Record Manager

uses assignFAST API in a feature to apply FAST headings 44 FAST Datasets Available under Open Data Commons Attribution License (ODC-By) Bulk downloads updated quarterly MARC Authority Format in XML MARC Authority Format in ISO MARC RDF/XML Change files published between updates MARC Authority Format in ISO MARC

Links to other Files Authority Count Library of Congress Subject Headings* LC NACO File 1,213,647 VIAF 1,213,232

DbPedia/Wikipedia 299,172 160,837 Geonames Total geographic coordinates 85,422 120,561 *One-to-One only, not including references to partial or pattern headings

46 Where FAST is used Bodleian Libraries, University of Oxford (U.K.) British Library (U.K., testing FAST) Chronicling Illinois & The Papers of Abraham Lincoln projects (U.S.A.) Cornell University Libraries (U.S.A.) Databib.org (U.S.A.) National Library of New Zealand (New Zealand) OCLC (U.S.A.) RMIT Publishing (Australia) University of North Dakota (U.S.A.)

FAST at OCLC WorldCat - January 2015 76 million records enhanced with FAST WorldCat Entities > WorldCat Works Experimental WorldCat Linked Data (includes DDC, FAST, VIAF and LCSH URIs) Experimental applications (OCLC Research) Classify

WorldCat Identities mapFAST 48 FAST in Classify Whats new? FAST geographic headings in VIAF Synchronizing FAST forms with LCGFT FAST Changes page http://fast.oclc.org/fastChanges/ Whats next?

Implementation of Machine-generated Metadata Provenance field (MARC 883) in 6xx headings in WorldCat (expected August 2015) Preserve user-added FAST headings Facilitate updating of machine-generated headings Under consideration/development User-defined subsets Support for local authority files More links to Wikipedia FAST Team OCLC Research Rick Bennett, Eric Childress, Kerre Kammerer, Diane Vizine-Goetz

WorldCat Quality Management Robert Bremer, Linda Gabel Links Project page http:// www.oclc.org/research/themes/data-science/fast.html Tools http://fast.oclc.org/searchfast/ http://experimental.worldcat.org/fast/fastconverter/ http://experimental.worldcat.org/fast/assignfast/ (+ API) http://experimental.worldcat.org/fast/ (+ API) Datasets http://

www.oclc.org/research/themes/data-science/fast/downl oad.html ALA Annual 2015 We Welcome Your Engagement http://www.oclc.org/research https://twitter.com/OCLC/lists/oclc-research https://www.facebook.com/OCLCResearch http://www.slideshare.net/oclcr News & events Reports & presentations

http://hangingtogether.org/ http://youtube.com/oclcresearch SM

Recently Viewed Presentations

  • Logarithmic Functions and their graphs

    Logarithmic Functions and their graphs

    Definition of Logarithmic Function. For x > 0, a > 0 and a ≠ 1, ?=log?? , if and only if ?=??. The function given by . ??=log?? is called the logarithmic function with base a. Logarithms are exponents (log??is...
  • A&S Meeting Leading for Equity

    A&S Meeting Leading for Equity

    Outcomes. As a result of the session, participants will be able to… Identify next steps to ensure that equity is the driving factor in your school's discipline work.
  • Introduction to Unit 14 - Holy cross college health and ...

    Introduction to Unit 14 - Holy cross college health and ...

    M3: Assess the provision of treatment and support LOCALLY for service users with different physiological disorders. 2. How effective is the treatment or support for your specific disorder?
  • Regular Languages and Properties - UsmanLive

    Regular Languages and Properties - UsmanLive

    A finite set of productions of the form One non-terminal finite string of terminals and /or non-terminals * Context Free Grammars By definition a context-free grammar is a finite set of variables (also called non-terminals or syntactic categories - synonym...
  • Ghosts of Departed Quantities: Calculus and its Limits

    Ghosts of Departed Quantities: Calculus and its Limits

    Concerned with those properties of geometrical figures that are unchanged under continuous deformation - rubber sheet geometry. Topology tackles the question: What . shape is this thing? Polyhedra. Comes from the Greek roots, poly meaning many and hedrameaning seat.
  • Trisomy 21- Down Syndrome

    Trisomy 21- Down Syndrome

    • Grade II — Moderate dilatation of the renal pelvis including a few calyces • Grade III — Dilatation of the renal pelvis with visualization of all the calyces, which are uniformly dilated, and normal renal parenchyma. • Grade IV...
  • Personal Productivity & Getting Things Done Good to

    Personal Productivity & Getting Things Done Good to

    - a practical and truly effective work-life management system. Jerrod will provide a compelling presentation on techniques that have transformed his way of managing the daily work load. Key takeaways will include: · Hands-on tools for transforming your way of...
  • Amazing Grace - Traditional Music Library

    Amazing Grace - Traditional Music Library

    1. O come, all ye faithful, joyful and triumphant, O come ye, O come ye, to Bethlehem. Come and behold Him, born the King of angels; O come, let us adore Him, O come, let us adore Him, O come,...