Vous êtes sur la page 1sur 188

Dimensional Data Modeling

Module 6

Course Agenda

2

Rationale for dimensional modeling Dimensional modeling basics Dimensional modeling details Fact table details Dimension table details Design process Aggregate schemas Multiple fact tables Architected data marts

Rationale for Dimensional Modeling

The Business Value Chain

A series of interrelated business processes which contribute to increased product value for the customer, and to profit for the enterprise

Porter 1985

Product Development

Operations

Sales and Marketing

Customer Services

Drive to Compete

Businesses constantly strive to optimize each process in the value chain Optimization requires measuring and analyzing the effectiveness of each process as well as the value chain as a whole

Product Development

Operations

Sales and Marketing

Customer Services

The Role of Information Technology

Process optimization

Supported by on-line transaction processing systems OLTP Supported by 'analytic' systems Data warehouse
Operations Sales and Marketing Customer Services

Measuring and analyzing processes


Product Development

Example OLTP Systems


Manufacturing and Process Control Shipping and Inventory Management Sales Order Entry and Campaign Management Customer Support and Relationship Management

Product Development

Operations

Sales and Marketing

Customer Services

OLTP Systems & Business Events

Events are the heart of every business


Book an order Print a pick list Record a cash withdrawal Post a payment

Event detail is collected by OLTP systems


Atomic focus Transaction consistency

OLTP System Reporting

OLTP systems answer event-oriented questions well


Run invoices Print ledger Pull up customer detail Focused on detail Predictable requirements and query patterns Does not reveal the overall performance of a process

Operational reporting

OLTP Design Characteristics

Focus of OLTP Design


Individual data elements Data relationships

Design goals

Accurately model business Remove redundancy

10

OLTP Design Shortcomings


Complex Unfamiliar to business people Incomplete history Slow query performance

11

Emergence of Dimensional Model

Logical modeling technique

For designing relational database structures For use in analytic systems Packaged goods industry 1996 book: 'The Data Warehouse Toolkit'

Addresses OLTP design shortcomings

First developed early 1980's

Popularized by Ralph Kimball, PhD.

12

Q&A

13

Dimensional Modeling Basics

14

Sample Value Chain Analysis


Product Development Operations Sales and Marketing Customer Services

"I need to see overall gross margin by category"

"What are outstanding receivables by G/L account?"

"How do inventory levels compare with sales by product and warehouse?"

What is the return rate for each supplier?

15

Process-oriented business questions

Measurement Focus
Product Development Operations Sales and Marketing Customer Services

gross margin

receivables

inventory levels, sales

return rate

16

Process-oriented business measures

Process Measurement

Measures

Metrics or indicators by which people evaluate a business process Referred to as Facts Margin Inventory Amount Sales Dollars Receivable Dollars Return Rate

Coffee Maker Fulfillment Report


Brand Captain Coffee Product Standard Coffee Maker Thermal Coffee Maker Deluxe Coffee Maker All Products Units Sold 5,000 Units Shipped 3,800 % Shipped 76%

Examples

2,400

1,632

68%

2,073

1,658

80%

9,473

7,090

75%

Facts
17

Perspective Focus
Product Development Operations Sales and Marketing Customer Services

category

G/L account

Product, warehouse

supplier

18

Process-oriented business perspectives

Process Perspectives

Dimensions

The parameters by which measures are viewed Used to break out, filter or roll up measures Often found after the word by in a business question Descriptive business terms Product Warehouse Customer Supplier

Coffee Maker Fulfillment Report


Brand Captain Coffee Product Standard Coffee Maker Thermal Coffee Maker Deluxe Coffee Maker All Products Units Sold 5,000 Units Shipped 3,800 % Shipped 76%

2,400

1,632

68%

Examples

2,073

1,658

80%

9,473

7,090

75%

19

Dimensions

Dimensional Model

Definition

Logical data model used to represent the measures and dimensions that pertain to one or more business subject areas Dimensional Model = Star Schema

20

Serves as basis for the design of a relational database schema Can easily translate into multi-dimensional database design if required Overcomes OLTP design shortcomings

Dimensional Model Advantages


Understandable Systematically represents history Reliable join paths High performance query Enterprise scalability

21

Schema Simplicity

Fewer tables

Store Time Facts

Denormalized Consolidated Familiar to users Facts go in the fact tables Dimensions in dimension tables

Dimensional

Product

22

Increases understandability

Star Schema

Data Familiarity

Adding business context


Single source field Expanded into parts Decoded into business terms Add special indicators and flags e.g. time dimension

ord_date

Time Dimension year quarter month date day of the week holiday flag

Increases understandability

23

Representing History

Time dimension

Store Time Dimension


year quarter month date day of the week holiday flag

Part of every star schema Marks the date when the facts (process measurements) occurred Allows the schema to easily add and query data over time Especially useful for performing comparison queries

Facts

Product

24

Time Dimension

Fewer Join Paths

Star schema joins

Defined during schema design - not runtime Business people can easily understand these relationships One-to-many relations between dimensions and facts Referential integrity always enforced

25

High Performance Design

Fewer joins means less 'expensive' queries Deterministic query patterns Star schema query optimization supported by all major RDBMS vendors

26

Subject Area Models


Subject area E/R models
Manufacturing and Process Control Shipping and Inventory Management Sales Order Entry Customer Support and Campaign and Relationship Management Management

Product Development

Operations

Sales and Marketing

Customer Services

Subject area dimensional models


27

Enterprise Models
Enterprise Scope E/R model

Enterprise scope dimensional model


28

Exercise 1

Scenario

Industry: Automobile manufacturing Company: Millennium Motors Value chain focus: Sales What are the top 10 selling car models this month? How do this months top 10 selling models compare to the top 10 over the last six months? Show me dealer sales by region by model by day What is the total number of cars sold by month by dealer by state?

Sample business questions:


29

List facts and dimensions

Exercise 1 - worksheet

30

Exercise 1 Solution

Facts

Sales revenue Quantity sold

Dimensions

Model name Month Dealer name Region State Date

31

Q&A

32

Dimensional Design Details

33

Star Schema Dimension Tables

Dimension tables Store dimension values Textual content Dimension tables usually referred to simply as 'dimensions' Spend extra effort to add dimensional attributes

Dimension Dimension

Dimension

34

Dimension Keys

Synthetic keys

Dimension Dimension
key key

Each table assigned a unique primary key, specifically generated for the data warehouse Primary keys from source systems may be present in the dimension, but are not used as primary keys in the star schema

Dimension
key

35

Dimension Columns

Dimension attributes

Dimension Dimension
Key attribute attribute attribute Key attribute attribute attribute

Specify the way in which measures are viewed: rolled up, broken out or summarized Often follow the word by as in Show me Sales by Region and Quarter Frequently referred to as 'Dimensions'

Dimension
Key attribute attribute attribute

36

Star Schema Fact Table

Process measures

Start by assigning one fact table per business subject area Fact tables store the process measures (aka Facts) Compared to dimension tables, fact tables usually have a very large number of rows

Fact Table

fact1 fact2 fact3

37

Fact Table Primary Key

Every fact table

Multi-part primary key added Made up of foreign keys referencing dimensions

Fact Table
key key key fact1 fact2 fact3

38

Fact Table Sparsity

Sparsity

Term used to describe the very common situation where a fact table does not contain a row for every combination of every dimension table row for a given time period Because fact tables contain a very small percentage of all possible combinations, they are said to be "sparsely populated" or "sparse"

39

Fact Table Grain

Grain

The level of detail represented by a row in the fact table Must be identified early Cause of greatest confusion during design process Each row in the fact table represents the daily item sales total

Fact Table

Example

40

Sparsity Example

Assume

5,000 rows in 'dealer' dimension 50 rows in 'model' dimension 5,000 * 50 = 250,000 sales every day 91,250,000 sales every year Assuming only one model sold in every dealer! Means that only a small fraction of the total possible 250,00 will be sold on a given day Generally, only record sales - not zeroes in fact table

If all dealers sold all models every day:


Sparsity

41

Designing a Star Schema


Five initial design steps Based on Kimball's six steps Start designing in order Re-visit and adjust over project life

42

Step One

Identify fact table


Start by naming the fact table with the name of the business subject area

43

Step Two

Identify fact table grain


Describe what a row in the fact table represents - in business terms

44

Step Three

Identify dimensions

45

Step Four

Select facts

46

Step Five

Identify dimensional attributes

47

Exercise 2

Scenario

Industry: Automobile manufacturing Company: Millennium Motors Value chain focus: Sales What are the top 10 selling car models this month? How do this months top 10 selling models compare to the top 10 over the last six months? Show me dealer sales by region by model by day What is the total number of cars sold by month by dealer by state?

Sample business questions:


48

Exercise 2 - continued

Using these sources data elements, design a star schema that answers the proposed business questions

Sales revenue Quantity sold Model name Dealer name Dealer city Product line Region where sold State Vehicle category Month Date of sales

49

Exercise 2 sample data

50

Exercise 2 - worksheet

51

Exercise 2 - solution

Step 1 - Fact table name:

'Sale facts' Every row in the sales facts table is a summary of car model sales for that day at a single dealer Time, Model, Dealer Total revenue, Quantity sold See next page

Step 2 - Fact table grain:

Step 3 - Dimensions:

Step 4 - Facts:

52

Step 5 - Dimensional attributes:

Exercise 2 Dimensional Model


Time
time_key

Model
model_key category line model

Sales Facts
model_key dealer_key time_key revenue quantity

year quarter month date

Dealer
dealer_key region state city dealer

53

Q&A

54

Fact Table Details

55

Example Fact Table

Sales Facts
model_key dealer_key time_key revenue quantity

56

Example Fact Table Records


Sales Facts
time_key 1 1 1 1 1 1 1 1
57

model_key 1 2 3 4 5 1 3 5

dealer_key 1 1 1 1 1 2 2 2

revenue 75840.27 152260.37 28360.15 132675.22 43789.45 35678.98 57864.78 92876.67

quantity 2 3 1 4 1 1 2 2

Primary Key

Facts

Facts

Fully additive

Can be summed across any and all dimensions Stored in fact table Examples: revenue, quantity

58

Example: Additive Facts


Time
time_key

Model
model_key brand category line model

Sales Facts
model_key dealer_key time_key revenue quantity

year quarter month date

Dealer
dealer_key region state city dealer

59

Facts

Semi-additive

Can be summed across most dimensions but not all Examples: Inventory quantities, account balances, or personnel counts Anything that measures a level Must be careful with ad-hoc reporting Often aggregated across the forbidden dimension by averaging

60

Example: Semi-additive Facts


Time
time_key

Model
model_key brand category line model

Sales Facts
model_key dealer_key time_key

year quarter month date

inventory

Dealer
dealer_key region state city dealer

61

Facts

Non-Additive

Cannot be summed across any dimension All ratios are non-additive Break down to fully additive components, store them in fact table

62

Example: Non-Additive Facts


Model
model_key
brand category line model

Sales Facts
model_key dealer_key time_key
revenue margin_amt

Time
time_key
year quarter month date

Dealer
dealer_key
region state city dealer

Margin_rate is non-additive Margin_rate = margin_amt/revenue


63

Unit Amounts

Unit price, Unit cost, etc.


Are numeric, but not measures Store the extended amounts which are additive Unit amounts may be useful as dimensions for price point analysis May store unit values to save space

64

Factless Fact Table


A fact table with no measures in it Nothing to measure... Except the convergence of dimensional attributes Sometimes store a 1 for convenience Examples: Attendance, Customer Assignments, Coverage

65

Q&A

66

Dimension Table
Details

67

Example Dimension Tables


Time Model
model_key
brand category line model

time_key
year quarter month date

Dealer
dealer_key
region state city dealer
68

Example Dimension Table Records


Time Dimension
time_key 1 2 3 150 777 year 1997 1997 1997 1997 1998 quarter Q1 Q1 Q1 Q2 Q4 month January January January April October date 1/15/97 1/16/97 1/17/97 4/1/97 10/13/98

Synthetic Key

Attributes

69

Example Dimension Table Records


Dealer Dimension
dealer_key 1 2 3 12 245 region Northeast Northeast Southwest Southwest Central state Massachusetts Massachusetts Arizona California Illinois city Boston Boston Tucson San Diego Chicago dealer Honest Ted's Stoller Co. Wright Motors American Lugwig Motors

Synthetic Key

Attributes

70

Dimension Tables

Characteristics

Hold the dimensional attributes Usually have a large number of attributes (wide) Add flags and indicators that make it easy to perform specific types of reports Have small number of rows in comparison to fact tables (most of the time)

71

Dont Normalize Dimensions


Saves very little space Impacts performance Can confuse matters when multiple hierarchies exist A star schema with normalized dimensions is called a "snowflake schema" Usually advocated by software vendors whose product require snowflake for performance

72

Example Snowflake Schema


Model
model_key
model line_key

Sales Facts
model_key dealer_key time_key
revenue quantity

Day
date_key
date month_key

Month

Quarter
quarter_key
quarter year_key

Year
year_key
year

month_key
month quarter_key

Line
line_key
line category_key

Category
category_key
category brand_key

Dealer
dealer_key
dealer city_key

City
city_key
city state_key

State
state_key
state region_key

Region
region_key
region

Brand
brand_key
73 brand

Slowly Changing Dimensions


Dimension source data may change over time Relative to fact tables, dimension records change slowly Allows dimensions to have multiple 'profiles' over time to maintain history Each profile is a separate record in a dimension table

74

Slowly Changing Dimension Example

Example: A woman gets married

Possible changes to customer dimension


Last Name Marriage Status Address Household Income

Existing facts need to remain associated with her single profile New facts need to be associated with her married profile

75

Slowly Changing Dimension Types

Three types of slowly changing dimensions

Type 1
Updates existing record with modifications Does not maintain history

Type 2
Adds new record Does maintain history Maintains old record

Type 3:
Keep old and new values in the existing row Requires a design change

76

Designing Loads to Handle SCD

Design and implementation guidelines

Gather SCD requirements when designing data mapping and loading SCD needs to be defined and implemented at the dimensional attribute level Each column in a dimension table needs to be identified as a Type 1 or a Type 2 SCD If one Type 1 column changes, then all Type 1 columns will be updated If one Type 2 column changes, then a new record will be inserted into the dimension table

77

Designing Loads to Handle SCD

Design and implementation guidelines

For large dimension tables, change data capture techniques may be used to minimize the data volume For smaller dimension tables, compare all OLTP records with dimension table records Balance data volume with change data capture logic complexities

78

Designing Loads to Handle SCD

Type 1 example: a woman gets married

Customer Dimension Table


Column Name Customer Key Customer ID Name Marital Status Home Income SCD Type N/A 1 1 1 1

79

Type 1 Example
OLTP
Customer OLTP
Cust ID Name Marital Home Status Income S $30K

Star Schema
Customer Dim
Cust Cust Key ID 1 Name Marital Home Status Income Status S $30K 0

Sales Facts
Cust Key 1 Day Key 1 Sales $40

Day Dim
Day Key 1 Business Date 1/31/01

123 Sue Jones

123 Sue Jones

Sue Gets Married 2/1/01


Customer OLTP
Cust ID Name Marital Home Status Income M $60K Cust Cust Key ID 1

Customer Dim
Name

Sales Facts
Day Key Sales 1 2 $40 $50

Day Dim
Day Key 1 2 Business Date 1/31/01 2/01/01

Marital Home Cust Status Income Status Key M $60K 0 1 1

123 Sue Smith

123 Sue Smith

80

Type 1 Example

Observations

Customer history is not maintained in the OLTP system Customer history is not maintained in the star schema Sue only has one customer 'profile' in customer dimension table Sues sales facts across all history are associated with her married profile Sales facts that were associated with Sues single profile have been lost

81

Designing Loads to Handle SCD

Type 2 example: a woman gets married

Customer Dimension Table


Column Name Customer Key Customer ID Name Marital Status Home Income
82

SCD Type N/A 2 2 2 1

Type 2 Example
OLTP
Customer OLTP
Cust ID Name Marital Home Status Income S 30K

Star Schema
Customer Dim
Cust Cust Key ID 1 Name Marital Home Status Income Status S $30K 0

Sales Facts
Cust Key 1 Day Key Sales 1 $40

Day Dim
Day Key 1 Business Date 1/31/01

123 Sue Jones

123 Sue Jones

Sue Gets Married 2/1/01


Customer OLTP
Cust ID Name Marital Home Status Income M $60K Cust Cust Key ID 1 2 123

Customer Dim
Name Sue Jones

Sales Facts
Day Key Sales 1 2 $40 $50

Day Dim
Day Key 1 2 Business Date 1/31/01 2/01/01

Marital Home Cust Status Income Status Key S M $30K $60K 1 0 1 2

123 Sue Smith

123 Sue Smith

83

Type 2 Example

Type 2 Observations

Customer history is not maintained in the OLTP system Customer history is maintained in the star schema Sue has two 'profiles' in the customer dimension Sues sales facts may be analyzed for when she was single, when she was married, and across all history by using the customer id field Home income was updated in the new profile record

84

Slowly Changing Dimension Advice

'When in doubt, design type 2'

85

Rapidly Changing Dimension (RCD)


Values change rapidly over time . No yardstick for telling when a dimension is slowly changing or not and this is based on the judgment of the data modeler. An SCD may become a RCD over time or vice versa.

86

Large Dimensions
Dimensions containing several million records!!!

HOW TO SUPPORT? Database to support indexing technology that support rapid browsing Find and suppress duplicate entries in the dimension (eg. Name and address matching) Never use Type 2 to solve changing dimensions (adding records)
87

Rapidly Changing Monster Dimensions


Dimensions containing > 100 million records!!!

HOW TO SUPPORT? Break the Monster dimension into separate dimension tables Constant information in original table New dimension table can have discrete values for each attribute Choose pre-defined set of values per attribute
88

Indexing

Bitmap Indexes on the foreign key columns in the fact tables. Bitmap Indexes on low cardinality columns in dimensional tables like Month, Product Category, Store category, etc B-Tree Indexes on Dimension key columns.

89

Rapidly Changing Monster Dimensions

Build the data in this dimension with all possible combinations of values for each attribute Identify each combination uniquely Everytime an event occurs and is recorded in fact table, attach it with the unique combination ID.

90

Example RCD

91

Example RCD

92

Degenerate Dimensions

Dimensions with no other place to go Stored in the fact table Are not facts Common examples include invoice numbers or order numbers

93

Example Degenerate Dimension

94

Junk/Dirty Dimension

A convenient grouping of random flags and attributes. After carving out all the dimensions some flags or text attributes that are left over in the fact table but do not belong to any of the dimension tables.

95

Junk/Dirty Dimension

Alternatives to be avoided:

Leaving the flags and attributes unchanged in the fact table record Making each flag and attribute into its own separate dimension Stripping out all of these flags and attributes from the design

Make a convenient grouping of the flags and attributes to get them out of a fact table into a useful dimensional framework.

96

Drilling

Drilling down

Quarterly Auto Sales Summary


Region Northeast Southeast Central Northwest Southwest Units Sold Revenue

Adding dimensional detail Further breaks out a measure in some way

Quarterly Auto Sales Summary


Region Northeast State Maine New York Massachusetts Units Sold Revenue

Southeast

Florida Georgia Virginia

97

Drilling
Quarterly Auto Sales Summary

Rolling up

Region Northeast

State Maine New York Massachusetts

Units Sold

Revenue

Removing dimensional detail Rolls up a measure

Southeast

Florida Georgia Virginia

Quarterly Auto Sales Summary


Region Northeast Southeast Central Northwest Southwest Units Sold Revenue

98

Drilling

Drilling across

A query that involves more than one fact table Not necessarily an action that changes how a user is looking at the data Best resolved by multiple SQL passes

99

Q&A

100

Dimensional Design Process


Project Context

101

Data Mart Development

Dimensional modeling is a critical part of the data mart development effort

Design Phase

Development Phase

Deployment Phase

102

Data Mart Development

Design phase

Determine requirements and design schema Iterative build and feedback Automate load, document, train users

Development phase

Deployment phase

103

Project Deliverables

Design

Deployment

Project definition document Project plan Schema design Mapping document Report design

Automation Documentation Training materials

Development

104

Populated data mart Load routines (Sagent Plans) Query and reporting environment

Project Approach

The dimensional model is developed during the design stage Scope of the project has already been determined

Design Phase

Development Phase

Deployment Phase

105

Design Stage Activities

Gather requirements through requirements workshops Develop star schema Conduct design review

Design Phase

Development Phase

Deployment Phase

106

Gather Requirements

Requirements definition

User workshops Spreadsheets Sample reports

Source systems analysis


DBA interviews Copybooks E/R diagrams

107

Design Deliverables

Deliverables

The star schema itself Load mapping document

How these primary components are delivered will depend on needs and format chosen

Modeling tools Spreadsheets Text documents

108

Notation Example

IDEF1X

Dependent entities - fact tables Independent entities - dimension tables


Time
time_key

Model
model_key

Sales Facts
time_key model_key dealer_key

Dealer
dealer_key

109

Notation Example

Martin IE

Entities - fact or dimension tables Attributes not shown


Time

Model

Sales Facts

Dealer
110

Notation Example

Kimball

Simple structure Cardinality implied

Time
time_key

Model
model_key

Sales Facts
time_key model_key dealer_key

Dealer
dealer_key

111

Design Naming Standards

Responsibility of data administration


Extended to the data warehouse Important to start early in the project

Suggested conventions

Fact tables Dimension tables Aggregate tables Keys

112

Data Element Definitions

Clear descriptions

Facts Calculated formulae Dimensional attributes Multiple meanings/synonymous terms Aliases

113

Data Element Instances

Example of Data

As it will exist in the warehouse After decoding Adds to model understanding Removes ambiguity/uncertainty

114

Data Element Mapping

Where is the data coming from

Source system Table Column Record Field

115

Data Transformation

Changing the data

Serves as spec for ETL process Decodes Type conversion Conditional logic Handling of NULLs

116

Q&A

117

Aggregates Schemas

118

Aggregate Designs

Aggregates

Pre-stored fact summaries Along one or more dimensions The most effective tool for improving performance

Examples

Summary of sales by region, by product, by category Monthly sales

119

Aggregate Background

Aggregate rationale

Improve end user query performance Reduce required CPU cycles Powerful cost saving tool

Restrictions

Additive facts only Must use dimensional design

120

Aggregate Guidelines

Dont start with aggregates Design and build based on usage Sooner or later you'll need to build aggregates

121

Aggregate Types

Separate Tables

Separate fact table for every aggregate Separate dimension table for every aggregate dimension Same number of fact records as level field tables Removes possibility of double counting Schema clarity Requires software with aggregate navigation capability

Advantage

Caveat

122

Separate Tables
One Way Aggregate
Mthly Sales Facts Agg
month_key product_key market_key Quantity Amount

Month
month_key Year Fiscal Period Month

Market
market_key Region District State City

Product
product_key Category Brand Product Diet Indicator

Sales Facts
time_key product_key market_key Quantity Amount

Time
time_key Year Fiscal Period Month Day Day of Week

123

Separate Tables
Two Way Aggregate
Category
category_key Category

Month
month_key Year Fiscal Period Month

Mnthly Cat Sales Facts Agg


month_key category_key market_key Quantity Amount

Market
market_key Region District State City

Sales Facts Product


product_key Category Brand Product Diet Indicator time_key product_key market_key Quantity Amount

Time
time_key Year Fiscal Period Month Day Day of Week

124

Aggregate Pitfalls

Sparsity failure

Term used to describe the result of building too many aggregate fact that do not summarize enough rows. When Sparsity failure occurs, a relatively small star schema can grow (in terms of disk size) thousands of times. Sparsity failure = aggregate explosion

125

Aggregate Design Guidelines

Rule of twenty

To avoid aggregate explosion Make sure each aggregate record summarizes 20 or more lower-level records Total number of possible fact tables in any given dimensional model = cartesian product of all levels in all the dimensions

Remember

126

Hierarchies & Aggregate Design

Hierarchy diagram

Helps visualize options for building aggregates Adding cardinalities insures following the rule of 20

Time 5 years Year (1)

20 quarters

Quarter (4)

Not required to build initial star schema

60 months

Month (12)

1825 days

Date (365)

127

Aggregate Navigation

Description

Function provided by software layer: Aggregate Navigator Directs user queries to the most favorable available aggregate Transparent to the end user

128

Aggregate Framework

Business View

Designer View

129

Aggregate Architecture
Aggregate Aware SQL RDBMS Client PC

Aggregate Aware SQL RDBMS Application Server

SQL

Client PC

Aggregate Aware SQL RDBMS

SQL

Client PC

130

Aggregate Deployment

Incremental Based on usage Transparent to users Typically warehouse DBA responsibility

131

Aggregate Deployment

Build Subject Area 1 No aggregates

Build Subject Area 2 No aggregates Build aggregates for Subject area 1

Build Subject Area 3 No aggregates Build aggregates for Subject area 2

Build Subject Area 4 No aggregates Build aggregates for Subject area 3

Some re-work required


132

Exercise 3

Scenario

Given the original star schema and the following hierarchy, design a two-way aggregate table structure that will drastically increase performance Make your own assumptions about summary levels

133

Exercise 3 Dimensional Model


Time
time_key

Model
model_key category line model

Sales Facts
model_key dealer_key time_key revenue quantity

year quarter month date

Dealer
dealer_key region state city dealer

134

Exercise 3

Scenario

Industry: Automobile manufacturing Company: Millennium Motors Value chain focus: Sales What are the top 10 selling car models this month? How do this months top 10 selling models compare to the top 10 over the last six months? Show me dealer sales by region by model by day What is the total number of cars sold by month by dealer by state?

Sample business questions:


135

Exercise 3
Dealer
All

Model
All

Time
All

Region

5 Category 20

Year

50

State

10

Quarter

1000

City

20

Line

60

Month

1000
136

Dealer name

40

Model name

1825

Date

Millennium Motors' dimensions

Exercise 3 Worksheet

137

Exercise 3 Solution
State
state_key
region state

Agg Sales Facts


state_key month_key model_key
revenue quantity

Month
month_key
year quarter month

Time
time_key
year quarter month date

Model
model_key
category line model

Sales Facts
model_key dealer_key time_key
revenue quantity

Dealer
dealer_key
region state city dealer

138

Q&A

139

Multiple Fact Tables

140

Multiple Fact Tables

Different business processes usually require different fact tables There are also several cases where a single business process will require multiple fact tables

Core and custom Snapshot and transaction Coverage Aggregates

141

Different Business Processes

Different business processes usually require different fact tables In practice, it may be hard to identify what a process is Sometimes you can spot different processes because measures are recorded

With different dimensions At differing grains

142

Different Dimensions or Grain


Shipper

Shipment Facts

Product

product_key Category Brand Product Diet Indicator

time_key product_key shipper_key market_key Quantity Weight

shipper_key name type mode address


Time

Sales Facts

time_key Year Fiscal Period Month Day Day of Week


Market

time_key product_key market_key Quantity Amount


143

market_key Region District State City

Different Dimensions or Grain

Dont take shortcuts with grain


The 'not applicable' dimension value Using a 'not applicable' row in a dimension confuses the grain and can introduce reporting difficulty

144

Different Points in Time

Sometimes, it is not easy to identify the discrete business processes All measures may have the same dimensionality or grain Different measures are recorded at different times

Quantity sold is not recorded at the same time as quantity shipped

145

Different Timing

Building a single fact table would require recording zero or null for measures that are not applicable at a point in time Reports would contain a confusing combination of zeros, nulls, and absence of data

146

Different Timing - One Fact Table


Time
time_key Year Fiscal Period Month Day Day of Week

Product
product_key Category Brand Product Diet Indicator

Sales and Shipment Facts


time_key product_key market_key Quantity_sold Amount_sold Quantity_shipped Amount_shipped

Market
market_key Region District State City

Initially will be null

147

Different Timing - Two Fact Tables


Shipment Facts Time
time_key product_key market_key Quantity Amount time_key Year Fiscal Period Month Day Day of Week

Product
product_key Category Brand Product Diet Indicator

Market Sales Facts


time_key product_key market_key Quantity Amount market_key Region District State City

148

Identifying Different Processes


Look at the measures in question Sort them into fact tables based on

Dimensions Grain Differing timings of events measured

149

One Process, Multiple Fact Tables


Core and custom Coverage Snapshot and transaction Aggregates

150

Core and Custom Schemas

There is a set of dimension attributes and measures shared in all cases Depending on the value in a dimension, certain extra dimension attributes or measures are recorded

Heterogeneous products Types of customers

151

Core and Custom


Product
product_key ...

Account Facts
time_key product_key branch_key customer_key Balance Transaction_count

Time
time_key ...

Customer
customer_key ...

Checking Account Facts


time_key checking_key branch_key customer_key Balance Transaction_count ...custom checking facts

Branch
branch_key ...

Checking Account
checking_key ...custom checking attributes

152

Core and Custom

Core fact table and dimensions


All attributes shared no matter what Appropriate for analysis across entire subject area

Custom fact table and/or dimensions

153

Contain attributes specific to a particular dimension value (e.g. Checking) Only appropriate when the business question is limited to that particular dimension value Should repeat shared facts to minimize need to access two fact tables

Coverage Schema

A star schema usually measure events that happen Relationships between the dimensions involved are not captured if events do not happen A coverage table fills the gap

What did not sell that was on promotion? Who was assigned to that customer?

Usually factless

154

Measuring What Happened

Sales facts does not reveal who is assigned to a customer if they do not sell
Time Product
product_key Category Brand Product SKU time_key Year Fiscal Period Month Day Day of Week

Sales Facts
time_key product_key customer_key rep_key quantity sales_dollars

Customer
customer_key Name Company Account Phone_num
155

Sales_rep
rep_key rep_name rep_phone Region District State City

Coverage Table

Customer_coverage_facts shows who is assigned to a customer at a point in time


Time
time_key Year Fiscal Period Month Day Day of Week

Customer
customer_key Name Company Account Phone_num

Customer Coverage Facts


time_key customer_key rep_key

Sales_rep
rep_key rep_name rep_phone Region District State City

156

Snapshot and Transaction


Viewing a single process multiple ways Transactions

The changes to what is being measured The status at a point in time Changes to inventory Current status of inventory

Snapshot

Example

157

Snapshot

How much is on hand today? How much was on hand yesterday?


Product
product_key Category Brand Product SKU

Inventory Snapshot
time_key product_key location_key quantity_on_hand

Time
time_key Year Fiscal Period Month Day Day of Week

Location
location_key Warehouse WH_code City State

158

Transaction

How did inventory change today? How much product was returned due to failed Time inspection?
Inventory Transactions
time_key product_key location_key transaction_type_key transaction_amount time_key Year Fiscal Period Month Day Day of Week

Product
product_key Category Brand Product SKU

Location
location_key Warehouse WH_code City State

Transaction_type
transaction_type_key transaction_type_code transaction_type transaction_category

159

Aggregate Tables

Aggregate table

A fact table that summarizes another fact table Created for performance reasons Covered in previous section

160

Design Tools for Multiple Tables

Create a set of matrices


Facts vs dimension Facts vs dimensional attributes

Mark where facts apply to dimensions Mark where facts apply to dimensional attributes When facts don't apply, assume separate fact table

161

Bus Matrix

A Planning Methodology for Large Data Warehouses with multiple data marts or dimensional models. Enables technical planning as well as executive communication. Exceptionally effective for distributed data warehouses without a center. Is simply a vertical list of data marts and a horizontal list of dimensions.

162

Example Matrix

Fact vs dimensional attribute matrix

Attribute 3

Attribute 1

Fact 1 Fact 2 Fact 3 Fact 4

X X X X

Attribute 2

X X X X

Attribute 4

X X X X

Attribute 5

Attribute 6

X X X X X X

163

Fact Table 2

Attribute 7

Attribute 8

Fact Table 1

Exercise 4

Scenario

Industry: Automobile manufacturing Company: Millennium Motors Value chain focus: Sales What are the top 10 selling car models this month? How do this months top 10 selling models compare to the top 10 over the last six months? Show me dealer sales by region by model by day. How many cars have been purchased over the last six months by customers with yearly household incomes greater than $200,000?

Sample business questions:


164

Exercise 4 - continued

Using these sources data elements, design a star schema that answers the proposed business questions

Daily sales revenue Daily quantity sold Model Dealer Dealer city Product line Region where sold State Vehicle category Date of sales

Customer name Customer zip code Customer yearly income P.O. Number Purchase price Discount amount Brand of car

165

Exercise 4 - worksheet

166

167

facts

daily_sales

daily_quantity Customer name Customer zip code Model Customer income X X X X X X X X X X X X X X X X Dealer P.O. Number Dealer city Product line Brand of car Region where sold State X X Vehicle category

purchase_price X X X X X X X X X X X X X Date of sales

discount_amount

Exercise 4 Solution - Matrix

X X X X X X X X X X X X X

Exercise 4 - Star schema


Model
model_key
brand category line model

Daily Sales Facts


model_key dealer_key time_key
revenue quantity

Time
time_key
year quarter month date

Dealer Customer Sales Facts Customer


customer_key
customer_name customer_zip yearly_income 168

dealer_key
region state city dealer

model_key dealer_key time_key customer_key po_number purchase_price discount_amt

Q&A

169

Architected Data Marts

170

Data Mart

Meaning of the term 'data mart' has shifted over the last several years...

171

Data Mart Architecture 1993

E.T.L. Software

E.T.L. Software

Query & Reporting Software Data Marts Analysis Users

Operational Systems
172

Data Warehouse

Data Mart Architecture 1997

E.T.L. Operational Systems


173

Software

Data Marts

Query & Reporting Software

Analysis Users

Architected Data Marts

E.T.L Software Operational Systems


174

Query & Reporting Software Data Mart Data Warehouse Analysis Users

Data Mart

Warehouse Subject Area

Incremental warehouse development Centralized architecture Not new Well - suited to star schemas

175

Stovepipe Data Marts

Time (Day) Store Sales Facts

Stovepipe data marts


Product Time (Day)

Inconsistent and overlapping data Difficult and costly to maintain Redundant data load Cant drill across Integration requires starting over

Warehouse Shipments Facts

Product

Month Warehouse Inventory Facts

Product

Dimensions not conformed

176

Conformed Dimensions

Definition

Dimensions are conformed when they are the same -orWhen one dimension is a strict rollup of another

177

Conformed Dimensions

Same dimensions must:

1. ... have exactly the same set of primary keys and 2. ... have the same number of records

178

Conformed Dimensions

Rolled up dimension

When one dimension is a strict rollup of another

Which means

Two conformed dimensions can be combined into a single logical dimension by creating a union of the attributes

179

Conformed Dimensions

Description

Shared common dimensions Integrates logical design Ensures consistency between data marts Allows incremental development Independent of physical location Some re-work may be required

180

Conformed Dimensions

Advantages

Enables an incremental development approach Easier and cheaper to maintain Drastically reduces extraction and loading complexity Answers business questions that cross data marts Supports both centralized and distributed architectures

181

Interlocking Star Schemas


Store Dimension Time Dimension Sales Facts Shipment Facts

Product Dimension

Warehouse Dimension

Inventory Facts Month Dimension

182

Conformed Dimensions

Kimballs Data Warehouse Bus


Sales Facts Shipment Facts Inventory Facts

183

Store

Product

Day

Warehouse

Month

When to Conform

Two approaches

Up-front As-you-go Both approaches work

Choose the approach that works for you

184

Conform Up Front

Cross Enterprise Analysis Create First-Cut Stars All Subject Areas


185

Finalize Design & Build Subject Area 1

Finalize Design & Build Subject Area 2

Finalize Design & Build Subject Area 3

Conform all Dimensions

Conform As-You-Go

Design & Build Subject Area 1

Design & Build Subject Area 2 Conform Dimensions

Design & Build Subject Area 3 Conform Dimensions

Design & Build Subject Area 4 Conform Dimensions

186

Some re-work required

Q&A

187

Course Review

188

Rationale for dimensional modeling Dimensional modeling basics Dimensional modeling details Fact table details Dimension table details Design process Aggregate schemas Multiple fact tables Architected data marts

Vous aimerez peut-être aussi