Content Optimization on Yahoo! Front Page

Content Optimization on Yahoo! Front Page

An Overview of Cloud Computing @ Yahoo! Raghu Ramakrishnan Chief Scientist, Audience and Cloud Computing Research Fellow, Yahoo! Research Reflects many discussions with: Eric Baldeschwieler, Jay Kistler, Chuck Neerdaels, Shelton Shugar, and Raymie Stata and joint work with the Sherpa team, in particular: Brian Cooper, Utkarsh Srivastava, Adam Silberstein, Rodrigo Fonseca and Nick Puz in Y! Research Chuck Neerdaels, P.P. Suryanarayanan and many others in CCDI 1 Questions What is cloud computing? Horizontal and functional services Whats it going to change? Software business models, science, life How many clouds will there be? 1, 2, 3, infinity Whats new in cloud computing? HPC grids, ASPs, hosted services, Multics (!) Emerging cloud stack to support a broad class of programs, including data intensive applications 2 Pie-in-the-sky

SCENARIOS 3 Living in the Clouds We want to start a new website, FredsList.com Our site will provide listings of items for sale, jobs, etc. As time goes on, well add more features And illustrate how more cloud capabilities (and corresponding infrastructure components) are used as needed List of capabilities/components is illustrative, not exhaustive Our cloud provides a dataset abstraction FredsList doesnt worry about the underlying components 4 Step 1: Listings Scenario FredsList wants to store listings as (key, category, description) FredsList.com application 1234323, transportation, For sale: one bicycle, barely used 5523442, childcare, Nanny available in San Jose

215534, wanted, Looking for issue 1 of Superman comic book DECLARE DECLAREDATASET DATASETListings ListingsAS AS ( (ID IDString StringPRIMARY PRIMARYKEY, KEY, Category CategoryString, String, Description DescriptionText Text) ) Simple Web Service APIs PNUTS Database 5

Step 2: System Evolution Fred belatedly realizes prices are useful information! ALTER ALTERDATASET DATASETListings Listings ADD (Price Float) ADD (Price Float) FredsList.com application 1234323, transportation, For sale: one bicycle, barely used 5523442, childcare, Nanny available in San Jose 215534, wanted, Looking for issue 1 of Superman comic book

32138, camera, Nikon D40, USD 300 Simple Web Service APIs vs. PNUTS Database Not every record in a dataset has values defined for all fields declared for the dataset Schemas are flexible, and evolve 6 Step 3: Search Federation of systems offering different capabilities FredsLists customers quickly ask for keyword search FredsList.com application

dvds bicycle nanny ALTER ALTERListings Listings SET SETDescription DescriptionSEARCHABLE SEARCHABLE Simple Web Service APIs PNUTS Vespa Database Search Tribble Messaging 7 Step 4: Photos FredsList decides to add photos/videos to listings

Federation of systems offering different performance points FredsList.com application ALTER ALTERListings Listings ADD ADDPhoto PhotoBLOB BLOB Simple Web Service APIs PNUTS Foreign key MObStor photo listing Vespa Storage

Database Search Tribble Messaging 8 Step 5: Data Analysis FredsList wants to analyze its listings to get statistics about category, do geocoding, etc. FredsList.com application Pig query to analyze categories Hadoop program to generate fancy pages for listings Hadoop program to geocode data ALTER ALTERListings Listings MAKE MAKEANALYZABLE ANALYZABLE

Simple Web Service APIs Grid PNUTS Foreign key MObStor Vespa photo listing Compute Database Batch export Storage Search Tribble Messaging 9 Step 6: Performance FredsList wants to reduce its data access latency And by now, Fred is

global, and wants georeplication! FredsList.com application ALTER Listings ALTER Listings MAKE MAKECACHEABLE CACHEABLE Simple Web Service APIs Grid PNUTS Foreign key MObStor Vespa memcached photo listing Compute Database Batch export Storage

Search Caching Tribble Messaging 10 Data Serving vs. Analysis Very different workloads, requirements Data from serving system is one of many kinds of data (click streams are another common kind, as are syndicated feeds) to be analyzed and integrated The result of analysis often goes right back into serving system 11 Motherhood-and-Apple-Pie EYES TO THE SKIES 12 Why Clouds? On-demand infrastructure to create a fundamental shift in the OE curve: Do things we cant do Build more robustly, more

efficiently, more globally, more completely, more quickly, for a given budget Cloud services should do heavy lifting of heavy-lifting of scaling & high-availability Today, this is done at the applevel, which is not productive 13 Requirements for Cloud Services Multitenant. A cloud service must support multiple, organizationally distant customers. Elasticity. Tenants should be able to negotiate and receive resources/QoS on-demand. Resource Sharing. Ideally, spare cloud resources should be transparently applied when a tenants negotiated QoS is insufficient, e.g., due to spikes. Horizontal scaling. It should be possible to add cloud capacity in small increments; this should be transparent to the tenants of the service. Metering. A cloud service must support accounting that reasonably ascribes operational and capital expenditures to each of the tenants of

the service. Security. A cloud service should be secure in that tenants are not made vulnerable because of loopholes in the cloud. Availability. A cloud service should be highly available. Operability. A cloud service should be easy to operate, with few operators. Operating costs should scale linearly or better with the capacity of the service. 14 Types of Cloud Services Two kinds of cloud services: Horizontal (Platform) Cloud Services Functionality enabling tenants to build applications or new services on top of the cloud Functional Cloud Services Functionality that is useful in and of itself to tenants. E.g., various SaaS instances, such as Saleforce.com; Google Analytics and Yahoo!s IndexTools; Yahoo! properties aimed at end-users and small businesses, e.g., flickr, Groups, Mail, News, Shopping Could be built on top of horizontal cloud services or from scratch Yahoo! has been offering these for a long while (e.g., Mail for SMB, Groups, Flickr, BOSS, Ad exchanges) 15 Opening Up Yahoo! Search Phase 1 Giving site owners and developers control over the appearance of Yahoo!

Search results. Phase 2 BOSS takes Yahoo!s open strategy to the next level by providing Yahoo! Search infrastructure and technology to developers and companies to help them build their own search experiences. 16 BOSS Offerings BOSS offers two options for companies and developers and has partnered with top technology universities to drive search experimentation, innovation and research into next generation search. API A self-service, web services model for developers and start-ups to quickly build and deploy new search experiences. CUSTOM Working with 3rd parties to build a more relevant, brand/site specific web search experience. This option is jointly built by Yahoo! and select partners. ACADEMIC

Working with the following universities to allow for wide-scale research in the search field: University of Illinois Urbana Champaign Carnegie Mellon University Stanford University Purdue University MIT Indian Institute of Technology Bombay University of Massachusetts (Slide courtesy Prabhakar Raghavan) 18 Partner Examples 19 Horizontal Cloud Services Horizontal cloud services are foundations on which tenants build applications or new services. They should be: Semantics-free. Must be "generic infrastructure, and not tied to specific app-logic. May provide the ability to inject application logic through well-defined APIs Broadly applicable. Must be broadly applicable (i.e., it can't be

intended for just one or two properties). Fault-tolerant over commodity hardware. Must be built using inexpensive commodity hardware, and should mask component failures. While each cloud service provides value, the power of the cloud paradigm will depend on a collection of well-chosen, loosely coupled services that collectively make it easy to quickly develop and operate innovative web applications. 20 Yahoo! Cloud Stack EDGE Brooklyn Horizontal Cloud Services YCPI WEB VM/OS Horizontal Cloud ServicesPHP yApache APP VM/OS Horizontal

Cloud Serving Grid Services STORAGE Sherpa Horizontal Cloud Services MOBStor App Engine Data Highway Monitoring/Metering/Security Provisioning (Self-serve) YCS BATCH Hadoop HorizontalCloud Services 22 Yahoo! CCDI Thrust Areas

Fast Provisioning and Machine Virtualization: On demand, deliver a set of hosts imaged with desired software and configured against standard services Multiple hosts may be multiplexed onto the same physical machine. Batch Storage and Processing: Scalable data storage optimized for batch processing, together with computational capabilities Operational Storage: Persistent storage that supports low-latency updates and flexible retrieval Edge Content Services: Support for dealing with network topology, communication protocols, caching, and BCP Rest of todays talk 23 Web Data Management Scan oriented workloads Focus on sequential disk I/O $ per cpu

cycle Large data analysis (Hadoop) Structured record storage (PNUTS/Sherpa) Blob storage (SAN/NAS) CRUD Point lookups and short scans Index organized table and random I/Os $ per latency Object retrieval and streaming Scalable file storage $ per GB 24

Hadoop: Batch Storage/Analysis Why is batch processing important? [Workflow] Whether its response-prediction for advertising machine-learned relevance for Search, or content optimization for audience, data-intensive computing is increasingly central to everything Yahoo! does Hadoop is central to addressing this need High-level query layer (Pig) Map-Reduce HDFS Hadoop is a case-study in our cloud vision Processes enormous amounts of data Provides horizontal scaling and faulttolerance for our users Allows those users to focus on their app

logic 25 The World Has Changed Web serving applications need: Scalability! Preferably elastic Flexible schemas Geographic distribution High availability Reliable storage Web serving applications can do without: Complicated queries Strong transactions 26 MObStor Yahoo!s next-generation globally replicated, virtualized media object storage service Better provisioning, easy migration, replication, better BCP, and performance New features (Evergreen URLs, CDN integration, REST API, )

The object metadata problem addressed using Sherpa, though MObStor is focused on blob storage. 2727 Storage & Delivery Stack 28 PNUTS / SHERPA To Help You Scale Your Mountains of Data 29 CCDIResearch Collaboration Yahoo! Research CCDI

Raghu Ramakrishnan Brian Cooper Utkarsh Srivastava Adam Silberstein Rodrigo Fonseca Chuck Neerdaels P.P.S. Narayan Kevin Athey Toby Negrin Plus Dev/QA teams 30 Yahoo! Serving Storage Problem Small records 100KB or less Structured records lots of fields, evolving Extreme data scale - Tens of TB Extreme request scale - Tens of thousands of requests/sec Low latency globally - 20+ datacenters worldwide High Availability - outages cost $millions Variable usage patterns - as applications and users change 31 31 What is PNUTS/Sherpa? A

B 42342 42521 E W C 66354 W CREATE CREATETABLE TABLEParts Parts(( ID IDVARCHAR, VARCHAR, StockNumber StockNumberINT, INT, Status VARCHAR Status VARCHAR ))

D E F 12352 75656 15677 E C E Structured, Structured,flexible flexibleschema schema Parallel Paralleldatabase database A B C D E F 42342 42521 66354

12352 75656 15677 E W W E C E A B C D E F 42342 42521 66354 12352 75656 15677 E W W E C E

Geographic Geographicreplication replication Hosted, Hosted,managed managedinfrastructure infrastructure 33 33 What Will It Become? A B C D E F 42342 42521 66354 12352 75656 15677 E

W W E C E A B C D E F 42342 42521 66354 12352 75656 15677 E W W E C E Indexes Indexesand andviews views

A B C D E F 42342 42521 66354 12352 75656 15677 E W W E C E 35 Design Goals Scalability Consistency

Thousands of machines Easy to add capacity Restrict query language to avoid costly queries Per-record guarantees Timeline model Option to relax if needed Geographic replication Multiple access paths Asynchronous replication around the globe Low-latency local access Hash table, ordered table Primary, secondary access High availability and fault tolerance

Hosted service Automatically recover from failures Serve reads and writes despite failures Applications plug and play Share operational cost 36 36 Technology Elements Applications Tabular API PNUTS API YCA: Authorization PNUTS Query planning and execution Index maintenance

Distributed infrastructure for tabular data Data partitioning Update consistency Replication YDOT FS Ordered tables YDHT FS Hash tables Tribble Pub/sub messaging Zookeeper Consistency service 37 37 Data Manipulation Per-record operations Get Set Delete Multi-record operations Multiget Scan Getrange Web service (RESTful) API 38

38 TabletsHash Table 0x0000 Name Description Grape Grapes are good to eat $12 Lime Limes are green $9 Apple Apple is wisdom $1 Strawberry 0x2AF3

0xFFFF $900 Arrgh! Dont get scurvy! $2 Avocado But at what price? $3 Lemon How much did you pay for this lemon? $1 Tomato Is this a vegetable? $14 Banana The perfect fruit

$2 New Zealand $8 Orange 0x911F Strawberry shortcake Price Kiwi 39 39 TabletsOrdered Table A Name Description Price Apple

Apple is wisdom $1 Avocado But at what price? $3 Banana The perfect fruit $2 Grape Grapes are good to eat $12 New Zealand $8 How much did you pay for this lemon? $1 Limes are green

$9 H Kiwi Lemon Lime Q Orange Strawberry Tomato Arrgh! Dont get scurvy! $2 Strawberry shortcake $900 Is this a vegetable? $14 Z 40 40

Flexible Schema Posted date Listing id Item Price 6/1/07 424252 Couch $570 6/1/07 763245 Bike $86 6/3/07 211242 Car

$1123 6/5/07 421133 Lamp $15 Color Condition Good Red Fair 41 Detailed Architecture Remote regions Local region Clients REST API Routers Tribble

Tablet Controller Storage units 42 42 Tablet Splitting and Balancing Each Eachstorage storageunit unithas hasmany manytablets tablets(horizontal (horizontalpartitions partitionsofofthe thetable) table) Storage Storageunit unitmay maybecome becomeaahotspot hotspot Storage unit Tablet

Overfull Overfulltablets tabletssplit split Tablets Tabletsmay maygrow growover overtime time Shed Shedload loadby bymoving movingtablets tabletstotoother otherservers servers 43 43 QUERY PROCESSING 44 44

Accessing Data 4 Record for key k 1 Get key k 3 Record for key k SU SU 2 Get key k SU 45 45 Bulk Read 1 {k1, k2, kn} 2

Get k1 Get k2 SU SU Get k3 Scatter/ gather server SU 46 46 Range Queries in YDOT Clustered, ordered retrieval of records Apple Avocado GrapefruitPear? Banana Blueberry Canteloupe Grape Kiwi Lemon

GrapefruitLime? LimePear? Router Lime Mango Orange Strawberry Apple Tomato Avocado Watermelon Banana Blueberry Storage unit 1 Canteloupe Storage unit 3 Lime Storage unit 2 Strawberry Storage unit 1 Strawberry Tomato Watermelon Storage unit 1

Lime Mango Orange Canteloupe Grape Kiwi Lemon Storage unit 2 Storage unit 3 47 Updates 1 8 Write key k Sequence # for key k Routers Message brokers 3 Write key k 2

7 Sequence # for key k 4 Write key k 5 SU SU SU 6 SUCCESS Write key k 48 48 ASYNCHRONOUS REPLICATION AND CONSISTENCY 49 49

Asynchronous Replication 50 50 Consistency Model Goal: Make it easier for applications to reason about updates and cope with asynchrony What happens to a record with primary key Alice? Record inserted Update v. 1 Update Update Update v. 2 v. 3

v. 4 Update Update v. 5 v. 6 Generation 1 v. 7 Delete Update v. 8 Time Time As the record is updated, copies may get out of sync. 51 51 Example: Social Alice East West

User Status Alice ___ User Status Alice Busy User Status User Status Alice Busy Alice Free

User Status User Status Alice ??? Alice ??? Record Timeline ___ Busy Free Free 52 Consistency Model Read Stale version

v. 1 v. 2 v. 3 v. 4 Stale version v. 5 v. 6 Generation 1 v. 7 Current version v. 8 Time In general, reads are served using a local copy 53 53 Consistency Model

Read up-to-date Stale version v. 1 v. 2 v. 3 v. 4 Stale version v. 5 v. 6 Generation 1 v. 7 Current version v. 8 Time But application can request and get current version 54

54 Consistency Model Read v.6 Stale version v. 1 v. 2 v. 3 v. 4 Stale version v. 5 v. 6 Generation 1 v. 7 Current version v. 8 Time Or variations such as read forwardwhile copies may lag the

master record, every copy goes through the same sequence of changes 55 55 Consistency Model Write Stale version v. 1 v. 2 v. 3 v. 4 Stale version v. 5 v. 6 Generation 1 v. 7 Current version v. 8

Time Achieved via per-record primary copy protocol (To maximize availability, record masterships automaticlly transferred if site fails) Can be selectively weakened to eventual consistency (local writes that are reconciled using version vectors) 56 56 Consistency Model Write if = v.7 ERROR Stale version v. 1 v. 2 v. 3 v. 4 Stale version v. 5 v. 6 Generation 1

v. 7 Current version v. 8 Time Test-and-set writes facilitate per-record transactions 57 57 Consistency Techniques Per-record mastering Each record is assigned a master region May differ between records Updates to the record forwarded to the master region Ensures consistent ordering of updates Tablet-level mastering Each tablet is assigned a master region Inserts and deletes of records forwarded to the master region Master region decides tablet splits These details are hidden from the application Except for the latency impact!

58 Mastering A B C D E F 42342 42521 66354 12352 75656 15677 E W W E C E A B C D E F

Tablet master A B C D E F 42342 42521 66354 12352 75656 15677 42342 42521 66354 12352 75656 15677 E W W E C E

E W W E C E 59 59 Bulk Insert/Update/Replace Client Source Data Bulk manager 1. Client feeds records to bulk manager 2. Bulk loader transfers records to SUs in batches Bypass routers and message brokers Efficient import into storage unit 60 Bulk Load in YDOT YDOT bulk inserts can cause performance

hotspots Solution: preallocate tablets 61 Index Maintenance How to have lots of interesting indexes and views, without killing performance? Solution: Asynchrony! Indexes/views updated asynchronously when base table updated 62 SHERPA IN CONTEXT 63 63 Types of Record Stores Query expressiveness S3 PNUTS Simple Object retrieval

Retrieval from single table of objects/records Oracle Feature rich SQL 64 Types of Record Stores Consistency model S3 PNUTS Best effort Eventual consistency Timeline consistency Object-centric Object-centric consistency consistency Oracle

ACID Strong guarantees Program Program centric centric consistency consistency 65 Types of Record Stores Data model PNUTS CouchDB Flexibility, Schema evolution Object-centric Object-centric consistency consistency Oracle Optimized for Fixed schemas

Consistency Consistency spans spans objects objects 66 Types of Record Stores Elasticity (ability to add resources on demand) Oracle PNUTS S3 Inelastic Elastic Limited (via data distribution) VLSD (Very Large Scale Distribution / Replication) 67

Data Stores Comparison Versus PNUTS User-partitioned SQL stores Microsoft Azure SDS Amazon SimpleDB Multi-tenant application databases Salesforce.com Oracle on Demand More expressive queries Users must control partitioning Limited elasticity Highly optimized for complex workloads Limited flexibility to evolving applications Inherit limitations of underlying data management system

Mutable object stores Amazon S3 Object storage versus record management 68 Application Design Space Get a few things Sherpa MySQL Oracle BigTable Scan everything Everest Records MObStor YMDB Filer

Hadoop Files 69 69 SQL/ACID Consistency model Updates Structured access Global low latency Availability Operability Elastic Alternatives Matrix Sherpa

Y! UDB MySQL Oracle HDFS BigTable Dynamo Cassandra 70 70 Further Reading Efficient Bulk Insertion into a Distributed Ordered Table (SIGMOD 2008) Adam Silberstein, Brian Cooper, Utkarsh Srivastava, Erik Vee, Ramana Yerneni, Raghu Ramakrishnan PNUTS: Yahoo!'s Hosted Data Serving Platform (VLDB 2008) Brian Cooper, Raghu Ramakrishnan, Utkarsh Srivastava, Adam Silberstein, Phil Bohannon, Hans-Arno Jacobsen, Nick Puz, Daniel Weaver, Ramana Yerneni Asynchronous View Maintenance for VLSD Databases, Parag Agrawal, Adam Silberstein, Brian F. Cooper, Utkarsh Srivastava and Raghu Ramakrishnan SIGMOD 2009 (to appear) Cloud Storage Design in a PNUTShell

Brian F. Cooper, Raghu Ramakrishnan, and Utkarsh Srivastava Beautiful Data, OReilly Media, 2009 (to appear) 71 QUESTIONS? 72 72

Recently Viewed Presentations

  • Records Management: An Introduction - UMIACS

    Records Management: An Introduction - UMIACS

    Records Management: An Introduction LBSC/INFM 708X Dr. Jean Dryden 2 February 2009 Records Management "Field of management responsible for the efficient and systematic control of the creation, receipt, maintenance, use and disposition of records, including processes for capturing and maintaining...
  • Chapter 8 Life at the Turn-of-the-20th-Century

    Chapter 8 Life at the Turn-of-the-20th-Century

    Chapter 8Life at the Turn-of-the-20th-Century. Summary. SECTION 1. SECTION 2. SECTION 3. SECTION 4. Science and Urban Life. Expanding Public Education. Segregation and Discrimination. The Dawn of Mass Culture. New technologies improve urban living, and a modern mass culture emerges....
  • Unit 2 - WordPress.com

    Unit 2 - WordPress.com

    CHART. ECONOMIST. MANAGEMENT. ... BLOW . DANDELION. MAKE A WISH. What is happening in the picture? Use both simple present and present continuous! WORK OUT . ABS. SIX-PACK. LIFT WEIGHTS. What is happening in the picture? Use both simple present...
  • Aeronautical Decision Making (ADM)

    Aeronautical Decision Making (ADM)

    "ADM is a systematic approach to the mental processes used by pilots to consistently determine the best course of action in response to a given set of circumstances." (FAA PHAK) AOPA - " The goal of ADM is simple: doing...
  • Head Neck Cancer- an Awareness Drive

    Head Neck Cancer- an Awareness Drive

    Cancer is a class of diseases characterized by out-of-control cell growth. Malignant tumors form when two things occur: a cancerous cell manages to move throughout the body using the blood or lymph systems, destroying healthy tissue in a process called...
  • Materials Handling Slide Presentation

    Materials Handling Slide Presentation

    Introduction. Lesson objectives: Identify types of material handling equipment. Describe hazards associated with material handling activities. Identify methods to prevent hazards associated with material handling equipment.
  • Decision Support for Quality Improvement

    Decision Support for Quality Improvement

    Next, their workflow must be analyzed and a determination must be made as to how the clinical decision support will fit into that workflow. It is important to understand that there may be many different workflow patterns, not just a...
  • Transparentní intensionální logika (TIL)

    Transparentní intensionální logika (TIL)

    the law of universal instantiation, lambda conversion and Leibniz's Law do not generally hold, all of which is rather unattractive. Worse, IL does. not . validate the Church-Rosser 'diamond'. It is a well-known fact that an ordinary typed -calculus will...