Real-world data movement and orchestration patterns using ...

Real-world data movement and orchestration patterns using ...

Learn. Connect. Explore. BRK2279 Real-World Data Movement and Orchestration Patterns

using Azure Data Factory Jason Horner, Attunix - @jasonhorner Cathrine Wilhelmsen, Inmeta - @cathrinew Agenda Overview

Design Patterns Preview of? Overview of Azure Data Factory

Azure Data Factory Sources Data Warehouse ETL / ELT

Analysis Reporting Azure Data Factory ETL / ELT Visual UI

Drag and Drop Code Support Python, .NET, ARM Control Flow Loop, Branch, If

SSIS Execution Lift and Shift ETL / ELT ETL - Extract Transform Load ETL - Extract Transform Load

ETL - Extract Transform Load ETL - Extract Transform Load ETL - Extract Transform Load ETL - Extract Transform Load

ELT - Extract Load Transform ELT - Extract Load Transform ELT - Extract Load Transform ELT - Extract Load Transform

ELT - Extract Load Transform ELT - Extract Load Transform ELT ETL

Azure Data Factory Concepts Azure Data Factory Concepts Pipelines Activities Linked

Services Datasets Triggers Integration Runtime

Azure Data Factory Design Patterns What are Design Patterns? Reusable solutions for common problems:

Description or template Formalized best practices Not finished designs that can be transformed directly into source or machine code Why use Design Patterns?

Use tested, proven and documented solutions to: Speed up development

Prevent issues than can cause problems later Improve code readability Design Patterns 1. 2. 3.

4. Truncate and Load Merge Load Incremental Load Bulk Table Transfer Full Extract: Truncate and Load

Specific use cases: All data needed, but replication is not available Small data sets that change often

No historical requirements Very simple, but can be considered an antipattern Full Extract: Truncate and Load Source

Sink Source Table Sink Table Full Extract: Merge Load

Specific use cases: All data needed, but replication is not available Medium data sets that have few changes

Need to minimize churn on the staging tables Adds complexity, doesnt solve the incremental extract from source Full Extract: Merge Load Source Source

Table Sink Table Type Stored Procedure Sink Table

Incremental Load Specific use cases: All data needed, including a robust history

Large data sets that have many changes Need to minimize churn on the staging tables and load on source systems Often requires changes to the source system (triggers, added columns, or engine features) Incremental Load

Source Sink Source Table Change Table

Change Tracking Current Version Table Type Stored

Procedure Control Table (High Watermark) Sink Table

Delta Detection Hash Comparison (Full Extract) High Watermark (Incremental Load) Change Tracking (Incremental Load) Other: column-by-column comparison, triggers, row versioning, modified dates, temporal

tables Delta Detection: High BE W ARY thes Watermark e appro of

ache s! Based on ascending integer or datetime Store the highest value in a control table or calculate by SELECT MAX() FROM Table Based on ascending date

Update or Create Assumes data is not updated and that the dates are maintained automatically Delta Detection: Change Tracking Lightweight solution for tracking data changes:

Has a row changed? Which rows have been changed? What kind of change was it? Which columns were changed?

Only tracks the latest change to a row Bulk Table Transfer Specific use cases:

Hundreds to thousands of tables to copy Similar loading patterns for all tables Need to minimize amount of code in solution Adds complexity, requires database tables to manage state

Bulk Table Transfer Source Source Table Sink Table Type

Stored Procedure Control Table List Sink Table Log Table

Auditing: Batches Every ETL Process should start by creating a Batch Batches are logical concepts used to tie multipipeline load processes together for Auditing and Logging A batch is closed when a nightly process is completed (Fail or Success)

Auditing: Common Columns CreatedDate - Date row was inserted CreatedBatchId - Batch that inserted row ModifiedDate - Date row was updated ModfiedBatchId - Batch that updated row IsDeleted - Indicates if record has been removed

Logging: Common Columns Row Counts - Selected, Inserted, Modified, Ignored ExecutionTime - Begin, End, Duration LoadStatus - Fail, Success Demo:

Solution Overview Jason Horner Design Patterns: Key Take Aways Model your Metadata correctly Make composable single purpose Pipelines Leverage Parameters and User Properties Lookup, Foreach, and Metadata, activities are powerful

Edit the JSON files directly when you hit a wall Preview of? Azure Data Factory Data Flows

Azure Data Factory Data Flows ETL / ELT Visual Authoring Drag and Drop Azure Databricks No Code

Transform At Scale Join, Split, Aggregate, Lookup, Filter, Sort, Derived Column Azure Data Factory Data Flows ETL / ELT

Demo: Azure Data Factory Data Flows Cathrine Wilhelmsen Thank you! Jason Horner, Attunix [email protected]

@jasonhorner Cathrine Wilhelmsen, Inmeta [email protected] @cathrinew Please evaluate this

session Your feedback is important to us! Please evaluate this session through MyEvaluations on the mobile app or website. Download the app: https://aka.ms/ignite.mobileApp

Go to the website: https://myignite.techcommunity.microsoft.com/evalu ations Copyright Microsoft Corporation. All rights reserved.

Recently Viewed Presentations

  • Web Services Integration Security

    Web Services Integration Security

    External Interfaces Update May 21, 2007 Daryl Shing Agenda Outcomes from recent API sub-group meetings Further Web Services Status of the Interface Specification Outcomes from recent API sub-group meetings Agreed that the specification for the initial release of services could...
  • Statistical Software Programming STAT 6360 Statistical Software Programming

    Statistical Software Programming STAT 6360 Statistical Software Programming

    The SAS DM is very helpful. We'll learn it too. * STAT 6360 -Statistical Software Programming First, An Example: pets.sas From eLC, download the SAS file called pets.sas. You will find this file in a folder (module) called "SAS Code"....
  • SEG 2100 Software Design II

    SEG 2100 Software Design II

    Fa├žade (GofF) Proxy(GofF) Information Expert (Expert) (Grasp Pattern) Creator (Grasp Pattern) Controller Pattern (Grasp Pattern) Chapter 6: Using design patterns Chapter 6: Using design patterns Chapter 6: Using design patterns * * * * Chapter 6: Using design patterns Chapter...
  • AQUATIC ECOSYSTEMS - Henriksen Science

    AQUATIC ECOSYSTEMS - Henriksen Science

    AQUATIC ECOSYSTEMS What are the basic needs of aquatic biota (organisms)? CO2 O2 Sunlight Nutrients- food & minerals Types of Aquatic Ecosystems Freshwater Ecosystems Standing Water- lakes & ponds Moving Water- rivers & streams Transitional Communities Estuaries Wetlands- bogs/fens, swamps...
  • NISHAAN SAHIB - tampagurmatcamp.com

    NISHAAN SAHIB - tampagurmatcamp.com

    Nishan sahib belongs to the guru. The Mughal said, "We will cut off your legs". Bhai . Alam. Singh boldly defying, responded, "Then, I will hold it with my mouth! This flag belongs to my GURU, I will never let...
  • Introduction to National 5 English

    Introduction to National 5 English

    - you will have to perform a solo talk or group discussion. Solo talks must be at least 3 minutes long. Group discussions must be at least 10 minutes long and each person must contribute significantly. Writing - if one...
  • Moving on to our Second Report! Due to

    Moving on to our Second Report! Due to

    Moving on to our Second Report! Due to NWCCU by Fall 2012 * Accreditation Steering Committee Bill Briare, Kim Carey, Kate Gray, Phillip King, Wes Locke, Elizabeth Lundy, Terry Mackey, Steffen Moller, Sharon Parker, Judy Redder, Tara Sprehe, Bill Waters...
  • Title Slide option 1 With photo

    Title Slide option 1 With photo

    John gillespie PhD. Health economics, health policy & reimbursement. ... healthcare providers and clinicians the actual results of care provided and how care can be improved," said Dr Lyons. ... ICHOM was founded in 2012 by Professor Michael Porter of...