The Query Mesh Project: A Powerful Multi-Route Query

The Query Mesh Project: A Powerful Multi-Route Query

The Query Mesh Project: A Powerful Multi-Route Query Processing Paradigm Rimma V. Nehme Microsoft Jim Gray Systems Lab [email protected] Elke. A. Rundensteiner Worcester Polytechnic Institute [email protected] Elisa Bertino Purdue University [email protected] New England Database Summit 2010 Thanx goes to NSF 0917017 for partial support of this project. 1 Motivation A variety of modern applications face data with non-uniform characteristics ubiquitous healthcare, location-based services, financial tickers, network monitoring Data Sources I want my results quickly. I dont Data care how exactly they are computed Database Engine

TYPICALLY 1.234 TYPICALLY ONE Query ONE execution execution Executor plan for plan for Query QueryDATA ALL ALL DATA Optimizer Plan Cost Overall Statistics Query Execution Plan Query Results Optimizer SELECT * SELECT FROM * FROM Query 2

Concrete Example: Network Monitoring Network Monitoring Here example is with streaming data Similar examples can be found with static data Multi-Plan Query Processing Single (/Route) Plan Query Processing Opportunity for Opportunity for Improvement: Improvement: Plan 1 Plan 2 Plan 3 Network packets Query It may be more efficient to use Execution It may be more efficient to usePlan different plans for different subsets different plans for different subsets of data Data Streams of data Query Query Optimizer DSMS

Optimizer SELECT * SELECT FROM * FROM Query Results Continuous Query 3 Outline Introduction & Motivation Background : Query Mesh Model Optimization Execution Dynamic Re-Optimization with Query Mesh Challenges Architecture Details Experimental Evaluation

Ongoing and future work Conclusion 4 Multi-Plan Query Processing Using Query Mesh Query Mesh provides a middle ground between a single pre-computed route and multiple runtime routes systems Physical Architecture of Eddies and its descendants Query Mesh Framework Traditional Query Optimization Eddy Single route-oriented solution Coarse optimization Small overhead Multi route-less solution Query Mesh Fine-granularity optim. Significant overhead Classifier Multiple routes Multi route-oriented solution Fine-granularity optimization Less overhead

(Here, route = execution plan 5 Query Mesh Search Space Search Space: the set of all possible solutions 1234 One plan for all data Search Space Complexity 14/23 1/234 124/3 13/24 123/4 134/2 12/34 Bell number Bn = sum of Stirling numbers of second kind S(n,k) Query Mesh Lattice Shaped Search Space 1/23/4 14/2/3 1/24/3 13/2/4 12/3/4 1/2/34 Stirling number of the second kind S(n, k) is the number of ways to partition a setEach of subset cardinality n has 1/2/3/4 route into exactly k nonemptyindividual subsets Set of training tuples {1,2,3,4}* has cardinality n =4 * We denote {{1},{2,3}} as 1/23 for brevity 6 Query Mesh Optimization Problem Query Mesh Cost Model

(main idea) Cost(QM) = Cost of Classifier + Cost of routes + Multiroute overhead Query Mesh Search Algorithms Optimal Query Mesh Search (Opt-QM) Query Mesh Search Heuristics Start Three components of search heuristics: solution (1) Start Solution (1) Form all possible sets for the 5 different approaches - extreme-1, extreme-N, random, given powerset content-driven, route-driven Experimentally evaluated (2) Search Strategy Final Randomized algorithms (2 ) Form partitions out of the solution -Iterative Improvement - Simulated annealing above sets (3) Stop condition Largely depends on the search strategy employed -K-iterations, Plateau, Time-bounded, 7 Too expensive! Need heuristics! Resource-bounded = explored solutions Main idea: Query Mesh Optimization Overview Query Query Optimizer

Optimizer Sample of Tuples (training dataset) - QM Optimizer - QM Executor Query Executor Query Executor sample and so on Compute Routes (i.e., plans) r1 rt212 t12 t11 t11 t10 t10 r3 t9 t9 t8 t8 t7 t7

sample t6 t6 t5 t5 t4 t4 Data Stream Induce Classifier sample t3 t3 t2 t2 t1 Query Mesh t1 r4 r1

r1 r2 r2 r4 r4 [NWRB09] R. Nehme, K. Works, E. Rundensteiner and E. Bertino, Query Mesh: Multi-Route Query Processing Technology, 8 (Demo) In VLDB 2009. Query Mesh Execution Overview Query Query Optimizer Optimizer - QM Optimizer - QM Executor Query Executor Query Executor Data Stream t12 t12 t11 t11 t10 t10

t9 t9 t8 t8 t7 t7 t6 t6 t5 t4 t5 t3 t4 t2 t3 t1 t2 t1 Classification Window (tumbling window) t5 After Classification [NWRB09] R. Nehme, K. Works, E. Rundensteiner and E.

Bertino, Query Mesh: Multi-Route Query Processing Technology (Demo), In VLDB 2009. t5 r-tokens data tuples t4 t9 t4 t9 t10 t10 t3 t6 t8 t3 t6 t8 t1 t2 t7 Send to Self-Routing Fabric t1 t2 t7 <1,4,3,2 route r1

> <2,4,3,1 route r2 > <3,4,1,2 route r3 > rusters 9 But data characteristics may change At time T At time T + 1 At time T + 2 At time T + 3 10 Dynamic Re-optimization with Query Mesh Can we have an execution strategy that is plan-based supports different plans for distinct subsets of data is as adaptive as Eddies Self-Tuning Query Mesh (STQM) [NRB09] R. Nehme, E. Rundensteiner and E. Bertino, Self-Tuning Query Mesh for Adaptive Multi-Route Query Processing, In EDBT 2009. 11 Outline

Introduction & Motivation Background : Query Mesh Model Optimization Execution Dynamic Re-Optimization with Query Mesh Challenges Architecture Details Conclusion Current and Future Work 12 Contributions Challenges Query Mesh Self-TuningQuery Mesh Classifier Multiple routes 1. 2.

3. What should be monitored to determine whether the current QM solution is no longer adequate? Data and Statistics Monitoring How to determine if the current QM solution should be adapted? Concept Drift Analysis, QM Cost Model, Improvement Measure How to efficiently execute the physical migration from the current QM to a new QM solution while the query is being Single Lightweight Operation to Physically Adapt QM executed? [NRB09] R. Nehme, E. Rundensteiner and E. Bertino, Self-Tuning Query Mesh for Adaptive MultiRoute Query Processing, In EDBT 2009. 13 ST-QM Architecture Query Optimiz er Query Executor Static QM Framework ST-QM Query Optimiz er Query Executor Adaptive QM Framework [NRB09] R. Nehme, E. Rundensteiner and E. Bertino, Self-Tuning Query Mesh for Adaptive MultiRoute Query Processing, In EDBT 2009. 14

ST-QM Components ST-QM measurements ST-QM ST-QM Monitor Monitor sampling recommendations ST-QM ST-QM Analyzer Analyzer New New Query Query Query Query Mesh Mesh Mesh Mesh ST-QM ST-QM Actuator Actuator actuation ST-QM Monitor continuously samples data and execution statistics that will be used to determine if a concept drift has occurred (i.e., QM needs to be adapted) ST-QM Analyzer determines if a concept drift has actually occurred and makes

recommendations if and how the QM solution should be adapted ST-QM Actuator takes these recommendations and physically adapts the QM solution 15 ST-QM Actuator: Physical Query Mesh Adaptation All possible recommendations: Case 1: Virtual Concept Drift Recommendation Case 2: Real Concept Drift Recommendation Case 3: Hybrid Concept Drift Recommendation R3 New Classifier + New R2 Old Classifier + New Routes Routes Self-Routing Fabric Query Mesh Query Mesh r1 Query R1 New Classifier + Old Routes Query Mesh New Classifier Data r2

r3 rusters r1 Current Classifier The The beauty beautyof of the the proposed proposed design!!! design!!! r2 r3 Online Classifier rusters 0 1 2 3 4 opi opi results

opk opl Op-modules OI-array Classifier Modification 17 Experimental Evaluation ST-QM was implemented inside Java-based continuous query engine called CAPE Compare its relative performance against competitor systems, namely, we compared adaptive QM against: Static (non-adaptive) QM, Adaptive plan-less Eddies Adaptive plan-less Eddies with CBR-based routing policy Results can be found in EDBT 2010. 18 Summary of ST-QM Experimental Results ST-QM gave up to 44% improvement in execution time and output rate compared to nonadaptive QM, Eddy and single plan execution

approach The runtime overhead of ST-QM relative to query execution is small (on average 2%). The actuation cost of physical adaptivity is nearly negligible resulting in 0.02% of total execution cost Even if no adaptivity is needed, ST-QMs performance in the worst case will be at most 2-19 Conclusion Query Mesh is practical query optimization approach Eliminates single plan assumption Feasibility shown Has low overhead & high potential benefit Easily implemented and integrated with existing systems Query Mesh leads to novel solutions Usage of machine learning in query optimization and query processing Usage of network-inspired techniques in query optimization and query processing 20

Next Steps in QM Project Consider state caching and indexing in QM stream context Work with alternate classification methods for route decisions Design customized query optimization and processing strategies Study multi-query processing and optimization Scale by applying distributed processing technologies 21 Thank you to current and past DSRG members for stream engine development, feedback, collaboration, and much more. Thank You for Listening !!!!! 22

Recently Viewed Presentations

  • Ethics and global health research

    Ethics and global health research

    Organizational Needs Global Needs Critical interpretive synthesis International survey 12 Humanitarian health workers 12 humanitarian policy-makers JORDAN: refugee camp GUINÉE: Ebola treatment centre RWANDA: long-term refugee camp Natural disaster KEVIN & CARRIE "Critical Needs" Vinsan Saito 2010 Ronald Searle 1943...
  • Evolutionary Classification

    Evolutionary Classification

    Divergent evolution describes evolution toward different traits in closely related species. Divergent evolution can lead to . speciation. kit fox. ancestor. red fox. EVOLUTIONARY CLASSIFICATION. Biologists group organisms into categories that represent lines of .
  • CONTRIBUTION Engagement - The Essential Element 1 Personality

    CONTRIBUTION Engagement - The Essential Element 1 Personality

    Thriving at work is the psychological state in which individuals experience both a sense of vitality and a sense of learning at work.1. Vitality: the positive feeling of having energy available, reflecting feelings of aliveness.
  • Microphysics and boundarylayer research from long-term Doppler lidar

    Microphysics and boundarylayer research from long-term Doppler lidar

    Microphysics and boundary-layer research from long-term Doppler lidar observations Resolving small-crystal controversy Aircraft probe measurements: enormous numbers (1 cm-3) of tiny crystals < 50mm in cirrus Cloud physics community is divided: are these real or an artefact due to break-up...
  • United Way of Greater Plymouth County Moving Give Away

    United Way of Greater Plymouth County Moving Give Away

    Office Supplies: Pick Up ASAP(first come, first served) Contact Kim. 508-583-6306 ext. 106. United Way of Greater Plymouth County Moving Give Away. 2 Tables. Office Supplies. United Way of Greater Plymouth County Envelopes. 1 box of plain, ~ 5 boxes...
  • I N D I A N A U

    I N D I A N A U

    Getting More for Less: A Software Distribution Model John V. Samuel, Craig A. Stewart, and Kevin J. Wilhite University Information Technology Services
  • Induction and Inductance

    Induction and Inductance

    An inductor is a device that can be used to produce a known magnetic field in a specified region. If a current . i. is established through each of the . N . windings of an inductor, a magnetic flux...
  • Statins and diabetes - a link?

    Statins and diabetes - a link?

    Current status of fenofibrate for DR? No clear national (or international) position. RCOphth: "Consider adding fenofibrate to a statin for non-proliferative retinopathy in type 2 diabetes.