IST722 Data Warehousing - Syracuse University

IST722 Data Warehousing - Syracuse University

IST722 Data Warehousing Physical Design Michael A. Fudge, Jr. Pop Quiz! For dimensional modeling define these: Conformed dimension Degenerate dimension Junk Dimensions Type 1,2,3 SCDs 3 types of facts 3 fact table grains

Pop Quiz! - Answers Dimensional Modeling Conformed dimension Shared Among DMs Degenerate dimension Dimensions in the Fact table Junk Dimensions Categorical Dimension / Catch All Type 1,2,3 SCDs 1. replace, 2. new row, 3. new column

3 types of facts Additive, Semi-Additive, Non-Additive 3 fact table grains Trans. / Periodic Snap. / Accumulating Snap So, where are we? Last Week: We covered: Dimensional Modeling We learned how to: Design dimensional models for relational databases.

Detailed Design This Week: Well cover ROLAP Implementation of Dimensional Models Well learn how to Implement dimensional models in relational databases. Technical

Recall: Kimball Lifecycle The Goal: Detailed Design to ROLAP Implementation Todays Agenda: Describe the process of implementing dimensional model designs in a relational database (ROLAP) Discuss approaches to implementation Walk through an implementation together using a case-study, so you can see this in action! The Physical Design Process, At a Glance Design

Develop Standards Detailed Dimensional Model / Physical Model Development Environment Instantiate Relational Database

Develop Security, Auditing and Staging tables and Index plan Test Environment Add Aggregations and improved Indexes Finalize database Designs

Design ROLAP Database & Test / Verify A word about Environments. Networked so others can access it Should be identical to prod in data and function. Measure performance here. Isolated to the

Developers Can use subsets of data Not for testing Test Dev Prod Our In-Class Case Study: Fudgemart Employee Time Sheets We will: Implement the ROLAP Schema Load with data to test / verify the model

Lets see the Detailed Design Workbook The ROLAP Star Schema Simple Data Mart Well use this throughout our lesson today. You can Generate the SQL from the Excel Dimensional Modeling Worksheet!!!

Developing Standards Design Develop Standards Detailed Dimensional Model / Physical Model Development Environment

Instantiate Relational Database Develop Security, Auditing and Staging tables and Index plan Test Environment Add Aggregations and improved Indexes

Finalize database Designs Design ROLAP Database & Test / Verify Naming Conventions Follow your organizations naming conventions Develop them if you dont have any! Consistency is key here Examples: Customer_Dim DimCustomer I use this one

Dim == Dimension dim_customer Fact == Fact Table [Dim Customer] Stg == Staged Data To Null or Not to Null? The attributes in your dimension tables should not have nulls Attributes without a value (null) should be assigned one Example: No email? No Email Null dates should get a special flag surrogate key Foreign keys in the fact table should never be null Nulls are okay for values in the fact tables.

We do this for the business users! Synonyms & Views Synonyms and Views are logical abstractions of tables and SQL SELECT statements, respectively. For any table directly accessible by an end user a view or synonym should be used. This way you can change the underlying tables without affecting the users external dependencies (Report, Web page, etc) CREATE VIEW name AS CREATE SYNONYM name FOR Primary Keys Dimension tables should use Surrogate keys

Fact tables should use composite keys made up of dimension foreign keys and degenerate dimensions. Most surrogate keys are number sequences date surrogate keys can be of the form YYYYMMDD Surrogate keys can be used in the fact table but they increase the table size and do not improve performance. Foreign Keys Foreign keys are important. Dont devalue! FKs enforce referential integrity between the PK in the dimension table and the FK in the Fact table. This prevents you from inserting invalid data into the Fact table. If youre concerned about the performance impacts of

constraint checking, you can drop the FKs, insert the data, then reinstate the constraints with the nocheck option. The Physical Model Design Develop Standards Detailed Dimensional Model / Physical Model Development

Environment Instantiate Relational Database Develop Security, Auditing and Staging tables and Index plan Test Environment Add Aggregations

and improved Indexes Finalize database Designs Design ROLAP Database & Test / Verify Use Data Modeling Tools! Useful for documenting metadata for tables and columns. Produce reports based on the model and documentation.

Most tools generate the SQL required to create your model. The Poor mans option is Hand write the SQL Examples: Oracle SQL Developer Data Modeler SAP Power Designer CAs ERWin IBM Rational / InfoSphere Microsoft Visio Enterprise Architect MySQL Workbench

A Tour of the Kimball Detailed Dimensional Modeling Workbook Part documentation. Part data modeling tool (DMT). All Fun! Is It Time to Use an SCM? Yes. SCM Source Code Management Git, Subversion, Mercurial, CVS Time to get serious about an SCM, since youll be Generating / creating code Making lots of changes Collaborating with others concurrently CSM tools allow you to record and track changes to your code

and easily roll-back versions and collaborate with others Learn Git: http://git-scm.com/doc Handling SCDs in the Dimension Tables Type 1 = No change to table required. Type 2 = Require extra columns to your dimension table to track changes Type 3 = Each time a change is made a new column need to be added to the dimension table. Example: Type 2 Handling Type 2 is the most common SCD These columns should be added to assist with tracking, but not displayed to the end-user.

Add these columns: RowIsCurrent (yes/no) Is this the current row. RowStartDate (datetime) Start date of valid row RowEndDate (datetime) End date of valid row RowChangeReason (text) Explain why row changed Demo: Fudgemart Workbook Star vs Snowflake Star Schema is preferred over snowflake as it is easier for users to

understand. If you need to snowflake, collapse your multi-valued / outrigger dimensions into a view. Snowflaking makes it easier to attach fact tables at different grain. Demo: Fudgemart Workbook (DimEmployee + dates) Sizing Estimates Need to know how must disk youll need. Calculate row lengths for Fact & Large Dimension tables. Estimate based on sizes of data types. Come up with initial load size + scheduled ETL Assume indexes will consume as much room as the base data. A good rule of thumb total space = 3 to 4 * Star Schema Size

Build Your Development Environment Detailed Design Develop Standards Dimensional Model / Physical Model Development Environment

Instantiate Relational Database Develop Security, Auditing and Staging tables and Index plan Test Environment Add Aggregations and improved

Indexes Finalize database Designs Design ROLAP Database & Test / Verify Physical Modeling Checklist 1. Design the physical ROLAP structure (using your DMT or SQL) 2. Initial ETL Load (Not Automated with ETL Tooling) Test and verify your data in the model

3. Finalize your Source-to-Target Map: Check Naming Conventions for tables & columns Name user-accessed views & synonyms Verify data type & length of columns Re-check your SCD types Rules for replacing NULL with a default value Add columns for maintenance and auditing purposes Instantiate the ROLAP Database Youll need this before you can develop the ETL process. You dont need to focus on performance at this point because you dont know the bottlenecks. The Development environment should be separate from the test environment. Use your SCM tool to manage code changes as you make them

And update your documentation! Demo: Fudgemart Workbook, generate SQL. Add An Auditing Dimension An Audit Dimension is a special table for tracking the ETL process. Each time the ETL process is run a row is added to the audit dimension table. Each Dimension and Fact table gets two more columns InsertAuditKey Which process loaded this row UpdateAuditKey Which process changed this row most recently? Will explore this while covering ETL. Demo: Fudgemart Workbook Initial Stage + ETL

To verify your ROLAP model, youll need to populate it with data. Initial Stage and ETL are typically done with SQL Queries If the data volume is too large, use sub-sets of the source data. Youre still exploring and validating your ROLAP Star Schema. Take the lessons learned as you profile for automating the ETL process to come. Best Practices for Staging Data Always stage your data as is to avoid a dependency on the source systems. You do not want your stage data in the same database or schema as your data warehouse. Helps keep the models tidy.

On your Server, youll notice you have Stage and DW for this reason. Demo: Stage and Initial Load via ETL Security Tables Security tables are used to filter row data based on user access or group access. For example: Current user is a member of Store 102, so she only sees Sales for that store. In SQL Server we use SYSTEM_USER to Id the user. All DBMSs have a means to do this. Demo: Add Security table so managers can see only their employees timesheets. The Test Environment

Design Develop Standards Detailed Dimensional Model / Physical Model Development Environment Instantiate Relational

Database Develop Security, Auditing and Staging tables and Index plan Test Environment Add Aggregations and improved Indexes Finalize database

Designs Design ROLAP Database & Test / Verify Test Environment This is the point where end-users enter into the process. Your system will be loaded with data so you will be able to monitor usage and adjust performance accordingly. Your test environment is separate from your Development environment. It should be network accessible. Indexing Dimension & Fact Tables

If your DBMS supports bitmapped indexes, add them to your dimension tables on attributes involved in row filters. Bitmapped indexes are good for low-cardinality columns (Y/N or High, Med, Low) Supported in Oracle, not SQL Server For fact tables, follow the index plan optimizer of your DBMS. Demo: Execution Plans Aggregations Aggregate popular rollup data. Monitor queries to find out whats popular. Improves performance. DimProduct Product Key PK Product Name

Product Color Product Subcat Key Product Subcat . Rollup FactSales Date Key PK,FK Product Key PK, FK Sales Amt Sales Qty FactSalesSummary Year-Month Key PK,FK Product Subcat Key PK, FK Sales Amt

Sales Qty DimDate Date Key PK Date Name Year-Month Year-Qtr . Summary Develop standards for consistency Use data modeling tool to help document the physical design. Use a SCM tool to track changes to your design. Add to your schema to support Type 2 & 3 dimensions. Include a framework for auditing the ETL process.

Build and verify your model in Development Introduce users during the test phase. IST722 Data Warehousing Physical Design Michael A. Fudge, Jr.

Recently Viewed Presentations

  • Computer Science Education (CSE) Students by Cristal Locke

    Computer Science Education (CSE) Students by Cristal Locke

    Students love the science behind the renowned CPU. Programming, webbing, debugging, and surfing; all intrigue them through and through. These future technological geniuses will download and prevail. Using Fast ForWord, Riverdeep, FCAT Explorer, and the widespread e-mail. *AI has help...
  • The Challenge: To Create More Value in All Negotiations

    The Challenge: To Create More Value in All Negotiations

    need MicrosoftWord FONTS "Showcard Gothic," "Ravie," "Chiller" and "Verdana" Master* Excellence part one (of 3) introduction to excellence.
  • Avalição do 11º Congresso Internacional de Educação a Distância

    Avalição do 11º Congresso Internacional de Educação a Distância

    654.005 SSP-DF. Administrador. Instituto Legislativo Brasileiro / Senado Federal. Via N 2 - Unidade de Apoio nº V . 17/09/2004. O pagamento será efetivado mediante emissão da nota fiscal e dos certificados dos participantes. apos o evento. CARLOS VIEIRA. Fundação...
  • Unit 1 Academic & Literary Vocabulary

    Unit 1 Academic & Literary Vocabulary

    Unit 1 Academic & Literary Vocabulary. Turn to page 2 in your Springboard books and read through the academic & literary vocabulary terms. On the QHT graphic organizer, place the terms in the appropriate columns.
  • The Watson's Go to Birmingham - Moore Public Schools

    The Watson's Go to Birmingham - Moore Public Schools

    The Watson's Go to Birmingham. Chapters 1-4. Chapter 1 Writing Time . ... Hambone scared Momma into thinking there were no black people in Michigan. Why would this bother Momma? ... How did Kenny feel when Rufus got mad at...
  • Colonial America The Revolutionary War - Mrs. Quimby's ...

    Colonial America The Revolutionary War - Mrs. Quimby's ...

    > The army lacked food, clothing, and other supplies. "Our troops are in general almost naked and…in starving condition. All my men except 18 are unfit for duty for want of shoes, stockings, and shirts…Poor Jack has…made up his blanket...
  • 2.2.3 Enzymes - PDST

    2.2.3 Enzymes - PDST

    Q. Suggest one reason why enzymes are not found in body soap or shampoo. Adenosine Triphosphate. By the volume of suds produced. FALSE. Enzyme trapped in beads or gel so that it will react with, but not mix with, its...
  • 21 Annual st In conjunction with Earth Day

    21 Annual st In conjunction with Earth Day

    Ed Stone Park had 210 volunteers and collected 2100 lbs of garbage. If anyone knows who this watch belongs to please contact Environmental Management at 386-736-5927. Rescued by a volunteer, a baby possum.