Anatomy of Aggregate Collections: The Example of Google

Anatomy of Aggregate Collections: The Example of Google

Anatomy of Aggregate Collections: The Example of Google Print for Libraries Brian Lavoie Senior Research Scientist OCLC Research OCLC Members Council Meeting October 2005 Aggregate collections Boundaries between local and external collections increasingly blurred Resource sharing (digital/network technologies) Cooperative collection management (resource allocation) Shift in focus to resources of the system (or subsets of the system), rather than individual collections Need data to support/illuminate system-wide perspective Characterize/analyze aggregate collections WorldCat: largest aggregate collection Aggregate holdings of >20,000 libraries Bridge from local to system-wide perspective The system-wide print book collection as represented in WorldCat (January 2005)

60,000,000 ~55 million ~32 million print books 50,000,000 ~41 million 40,000,000 ~35 million 30,000,000 20,000,000 10,000,000 0 Total WorldCat Records Language-based monographs Language-based monographs, excluding government documents and theses/dissertations Language-based monographs,

excluding government documents and theses/dissertations, in print format only More information: http://www.oclc.org/research/presentations/lavoie/cni2005.ppt Google Print for Libraries Aggregate collection of print books Focus on copyright issues; very little discussion of Google Print for Libraries as an aggregate collection Aggregate print book holdings of five major research libraries (Harvard, Michigan, Oxford, NYPL, and Stanford) What are characteristics of this aggregate collection? How does it relate to the system-wide collection? WorldCat: useful data source for analysis

Lavoie, Connaway, Dempsey: Anatomy of Aggregate Collections: The Example of Google Print for Libraries D-Lib (September 2005) http://www.dlib.org/dlib/september05/lavoie/09lavoie.html G5 coverage of system-wide print book collection 33% Held by at least one G5 library 67% Not held 10.5 10.5 million million unique unique books books Holdings overlap 10% Held by 3 20%

Held by 2 3% 6% Held by 4 Held by 5 61% Held by 1 Potential Potentialredundancy redundancy rate rateof of40 40percent percent Language distribution Language English German French Spanish Chinese Russian Italian Japanese Hebrew Arabic

Portuguese Polish Dutch Latin Korean Swedish All others Google 5 0.49 0.10 0.08 0.05 0.04 0.04 0.03 0.02 0.02 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.07 System-wide 0.52 0.08

0.08 0.06 0.04 More 0.03 Morethan than430 430 languages 0.03 languagesin in Google 0.04 Google55 collection 0.01 collection 0.01 0.01 0.01 0.01 0.01 0.01 < 0.01 0.08 Proportion Published During or Prior To Current Year

Cumulative age distribution of G5 holdings 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 Years > >80 80percent percentof ofGoogle Google55 collection collectionstill stillin incopyright copyright Works 35000000

32 m illion 30000000 26.1 m illion 25000000 20000000 Google 5 System -w ide 15000000 10.5 m illion 10000000 9.1 m illion 5000000 0 Manifestations Works Coverage Coverageslightly slightly higher higher(35 (35%) %)

Holdings Holdingsoverlap overlap slightly slightlygreater greater (56 (56% %held helduniquely) uniquely) Some speculation What results would have been obtained if a different group of libraries had been selected? What incremental extensions to coverage can be obtained by adding additional library collections to original Google 5? Chose 5 new libraries: Small US liberal arts college Large US public university Large US private university Large US metropolitan library Large Canadian university

Beyond the Google 5 New Google 5 Original Google 5 Total holdings: Total unique books: % of system-wide: ~8 million 5.9 million 18 percent ~18 million 10.5 million 33 percent Redundant holdings: 26 percent 42 percent Impact by library type: Large US metropolitan library: Large US private university: Large Canadian university: Large US public university:

Small US liberal arts college: % of holdings unique relative to original G5 collection: 39 percent (most unlike G5) 25 percent 23 percent 21 percent 13 percent (most like G5) The Google 10 Google Google10 10collection: collection: 12.3 12.3million millionbooks books + +1.8 1.8million million(17 (17%) %) Original Google 5 (10.5 million books)

Diminishing returns? Original G5: ~18 million holdings 58% unique New G5: ~8 million holdings 22% unique Anatomy of aggregate collections Mass digitization programs and other aggregate collections increasingly common features of library landscape Effective decision-making/planning aided by convergence on set of standard questions that help map out anatomy of aggregate collections Example: mass digitization programs What are characteristics of overarching population of materials that is target of digitization effort? How much of population will digitization effort cover? What is potential degree of redundancy? What bibliographic unit is focus of digitization (e.g., manifestations, expressions, works)? What number of participants and combination of institution types is optimal for obtaining maximum benefit with minimum cost?

Aggregate collections and WorldCat WorldCat more than tool for cataloging and reference; also strategic resource for managing aggregate collections OCLC Group Services http://www.oclc.org/groupservices/ OCLC WorldCat Collection Analysis Service http://www.oclc.org/collectionanalysis/ OCLC Research data-mining activities Web site: http://www.oclc.org/research/projects/mining/

Recently Viewed Presentations

  • s t n e Par e m o

    s t n e Par e m o

    Physical- Provide the environment for learning proper fundamental movement skills such as running, jumping, twisting, kicking, throwing and catching.. Technical - The player and the ball: Running with the ball, dribbling, controlling, kicking and shooting. Tactical - None. Social- Fun,...
  • ENERNET Bob Metcalfe Polaris Venture Partners April 9,

    ENERNET Bob Metcalfe Polaris Venture Partners April 9,

    ENERNET Bob Metcalfe Polaris Venture Partners April 9, 2008 at Venture Summit East, Boston Four Seasons
  • Protists Origin of eukaryotic cells - Furman University

    Protists Origin of eukaryotic cells - Furman University

    Protists Origin of eukaryotic cells Read all of chapter 27, but don't concentrate on memorizing all names. Focus on major evolutionary trends, and the names that are emphasized in the lectures. The protists Simple, mostly unicellular, eukaryotes. Once called the...
  • Ir &amp; Stem-changing Verbs

    Ir & Stem-changing Verbs

    Stem-changing verbs. With regular verbs, the stem stays the same and the endings change according to the subject. With stem-changing verbs, a letter in the stem changes in all forms except nosotros, and the endings change normally.
  • Wave Propagation Theories Study - University of South Carolina

    Wave Propagation Theories Study - University of South Carolina

    Wave propagation experiments were also conducted on realistic aircraft panel specimens with a number of PZT active sensors affixed on it at various locations. Constant-frequency 10 kHz wave bursts were sent by the transmitter active sensor, and the response was...
  • FileNewTemplate - Cuyahoga County

    FileNewTemplate - Cuyahoga County

    Cleveland's Travel and Tourism Industry is a critical driver of Northeast Ohio's Economy. VISITORS. 2011: 14M in Cuyahoga County, up 15% since 2009
  • The Trade Law Tool Box - America&#x27;s Trade Policy

    The Trade Law Tool Box - America's Trade Policy

    The President's 2017 Trade Policy Agenda "The overarching purpose of our trade policy * * * will be to expand trade in a way that is freer and fairer for all Americans." Increase economic growth. Promote job creation in the...
  • &quot;There Will Come Soft Rains&quot; - Denton ISD

    "There Will Come Soft Rains" - Denton ISD

    Figures of speech, such as similes, metaphors, idioms, and personification are all examples of imagery. An author uses imagery in order to link two ideas and to create a vivid or life-like image in their audience's mind.