Vous êtes sur la page 1sur 51

Jose Hernandez

Director of Business Intelligence


Dunn Solutions Group

Agenda
Introduction
What is a Data Warehouse?
Dimensional Modeling

Overview

Chicago

Minneapolis

Raleigh

Full-service IT consulting firm


Founded in 1988
Offices

Chicago
Minneapolis
Raleigh
Bangalore, India

Bangalore

Practice Areas
Application Development

Business Intelligence

Packaged Solutions

Custom App Dev

End-to-End BI

Legal Dashboard

Mobile App Dev

Data Warehouse

Visible Visitors

Portals

Dashboards

Web Design

DI + EIM/Quality

E-Commerce

Budgeting & Planning

Training
Open-Enrollment

Map Intelligence
On-Site + Custom
Managed Services
Jumpstart/Mentoring
Predictive Analytics

Selected Clients

City of Chicago

Partnerships

Introduction: The New Series


Focus on Data Warehousing
Tool Agnostic
Kimball Focus

Introduction: This Presentation


Data Warehouse Back to Basics
We start with 50,000 foot view
Assuming you are new to data
warehousing

Keep it fundamental
Kimball point of view
What, Why and How

Why Build a Data Warehouse


We have mountains of data in this
company but we cant access it!
We need to slice and dice the data
in a variety of ways.
You have to make it easy for
business people to get at the data.

Two people present the same


business metrics and the numbers
are different!
We want people to make decisions
based on facts.

Why Build a Data Warehouse


Operational systems are not
integrated

IDs and Codes not conformed


Inconsistent format
Data quality issues

Operational systems generally


not ideal for reporting

Lack history
Complex data structure
Moving target

Poor query performance

Goals of a Data Warehouse


Make an organizations data easy
to access
Present the organizations data
consistently
Be adaptive and resilient to
change

Trusted and secure


Serve as the foundation for
informed decisions
Business community must accept
the warehouse if it is to be
successful

Agenda
Introduction
What is a Data Warehouse?
Dimensional Modeling

What is a Data Warehouse?


A simple question
- does not seem
to have simple
answer!
Many definitions

Two that you


should consider

Ralph Kimball
Bill Inmon

What is a Data Warehouse


A data warehouse is a system that extracts,
cleans, conforms and delivers source data into a
dimensional data store and then supports and
implements querying and analysis for the purpose
of decision making...
Its the place where users go to get their data
Ralph Kimball

What a Data Warehouse is NOT


It is NOT

Jose

A product
A language
A project

A data model
A copy of your transactional systems
*Note: There are bundled products that come close to covering many aspects of
a data warehouse!

The BI Stack

Extract
Clean
Conform
Deliver
ETL
Management
Services
ETL Data Stores

Presentation Server

ETL System

Source Systems

Legacy
mainframe
systems
Production
databases
Transactional
systems
Subscription data

Data Marts
Stars &
Snowflakes
Conformed
Dimensions
Conformed Facts

BI Applications

Front Room

Back Room

Metadata
Infrastructure and Security

Reporting
systems
Ad hoc systems
Dashboards
Analytics
systems

Our Focus Today

Extract
Clean
Conform
Deliver
ETL
Management
Services
ETL Data Stores

Presentation Server

ETL System

Source Systems

Legacy
mainframe
systems
Production
databases
Transactional
systems
Subscription data

Data Marts
Stars &
Snowflakes
Conformed
Dimensions
Conformed Facts

BI Applications

Front Room

Back Room

Metadata
Infrastructure and Security

Reporting
systems
Ad hoc systems
Dashboards
Analytics
systems

Agenda
Introduction
What is a Data Warehouse?
Dimensional Modeling

Dimensional Modeling
Dimensional modeling
is a technique which
allows you to design a
database that meets
the goals of a data
warehouse.

Steps
Identify Business Process

Identify Grain (level of


detail)
Identify Dimensions
Identify Facts
Build Star

Identify the Business Process


Requirements + Data Availability
Determine discrete business
processes (e.g.)
Sales

Inventory
Student Registration

Identify the Grain


Grain is the level of detail
stored in the data
warehouse.

Typically implement the


lowest possible
dimension grain:

Do we store all products, or


just product categories?

not because users need


individual records

Each month, week, day,


hour?

Has a big impact on size of


database.

because they want to


aggregate in many different
ways

Can be a different grain


for each fact

Identify Dimensions
Selection Criteria (where Gender=Female)
Row Headers (College Name, Region, )
How do you want to slice the data?
What are the artifacts of your business?

Time Dimension - Always present


Conforming Dimensions very important aspect
of a successful data warehouse!*
*More on this later

Identify the Facts


Facts are the storage place for the measurements
we take...
Flavors of Facts
Counts, Sums

Transaction Grain

Additive

Periodic Snapshot Grain

Non-Additive

Accumulating Snapshot
Grain

Semi-Additive
Fact-less Facts

Dimensional Modeling - Stars


Why is it called a star?

Dimensional Modeling - Stars


Because it looks like a
star! (kinda)
Fact Table in the center
Dimension Tables
surrounding it

Dimensional Modeling - Constellation

Dimensional Modeling Fact Tables


Fact Tables
The center of the star
schema
Based on a business
process
Contains the business
process measures
All measures in the fact are
of the same grain

Fact tables are narrow but


deep

Dimensional Modeling Dim Tables


Dimension Tables
Business entities used to
slice up (determine the
grain) of the Facts
Verbose and textual
Should be conformed
across the organization
Wide but shallow
Always use surrogate
keys*
*exception for the Date Dimension

Star Schema Physical Model

Date Dimension (my favorite dimension)


The Basic Date Dimension

Date Dimension
Special Date Dimension Attributes
In another language
Semester (First Semester, Second
Semester, )

High Season (Y/N), Low Season (Y/N)


Season (Winter, Spring, Summer, Fall)
Reporting Day (CurrDay, CurrDay-1D,
CurrDay-2d)
Reporting Month (CurrMonth,
CurrMont-1M, )
Last Day of Quarter (Y/N)
Last Day of Week (Y/N)
American Holiday (Independence Day,
Christmas, )

Canadian Holiday
And so many more!

Slowly Changing Dimensions


Known as SCDs
Dimensions change, how
do you handle this?
Three Basic Types
Type 1
Type 2
Type 3

Hmmm.... these
are very
descriptive names.

Slowly Changing Dimensions (SCDs)


Type 1:

Do not preserve history


Overwrite the record

Type 2:

Preserve all history

Add a new record, indicate


current version

Type 3:

Preserve a point-in-time
history

Add additional column(s)

Type 2

Slowly Changing Dimensions: Type 2


SCD workhorse approach
When a dimension
attribute changes, add a
new row and update
effective dates
Old fact rows point to the
previous dimension row
New fact rows point to the
current dimension row
You can use a flag too

Other types of Dimensions


Rapidly Changing
Dimensions
Mini-dimensions
Degenerate Dimension
Junk Dimension
Outrigger

Rapidly Changing Dimensions


AKA: Rapidly Changing
Monster Dimensions
A dimension with
attributes that change
frequently is considered a
rapidly changing
dimension
Produces very large
dimension tables
Cannot be handled with
Type 2 approach (gets
too big)

Mini-dimensions
Technique for Rapidly
Changing Monster Dimension

Customer Dimension
PK

Customer ID
Name
Address
DoB
Date of First Order
------Age
Gender
Annual Income
Number of Children
Marital Status

Use mini-dimensions

Split up the rapidly changing


attributes to a mini-dimension

Join the mini-dimension to the fact


table

PK

Customer Key
More Foreign Keys
Facts...

Customer Key
Customer ID
Name
Address
DoB
Date of First Order

Minimizes rows (no discrete values)


A significant compromise

FK1

New Customer Dimension

Use banded ranges

Fact Table

Customer Key

Customer Demographics Dim


PK

Customer Demo Key


Age Band
Gender
Annual Income Band
Num of Children Band
Marital Status

Fact Table 2

FK2
FK3

Customer Key
Customer Demo Key
More Foreign Keys
Facts...

Other Dimensions
Rapidly Changing
Dimensions
Mini-dimensions
Degenerate Dimension
Junk Dimension
Outrigger

Other Dimensions
Rapidly Changing
Dimensions
Mini-dimensions
Degenerate Dimension

A dimension key that


has no attributes.
A dimensional attribute
stored in the fact table

Junk Dimension
Outrigger

Examples:
Transaction Number
Invoice Number
Line Item Number
Ticket Number

Other Dimensions
Rapidly Changing
Dimensions
Mini-dimensions
Degenerate Dimension
Junk Dimension
Outrigger

Do you have a drawer in


your kitchen that is a catch
all for stuff that you might
need...the junk drawer?
A collection of low
cardinality flags and
indicators that you might
need.
Examples: Payment Type,
Inbound/Outbound, Order
Type

Other Dimensions
Rapidly Changing
Dimensions

Exception, not the rule!


The start of snow-flaking

Mini-dimensions
Degenerate Dimension
Junk Dimension

A secondary dimension table is


connected to a dimension table
(not via a fact).

Outrigger

Human Resource Fact


Employee Dimension
FK1

Employee Key
More FK
HR Fact 1
HR Fact 2

PK

Employee Key

FK1

Employee Attributes
......
Emp Skill Key

Emplyee Skill Group (Outrigger)


PK

Emp Skill Key


Emp Skill Description
Emp Skill Category

Just the Facts Tables


Home for the numerical measures
Typically Additive
Three types of Fact Tables
Transactional Grain
Periodic Snapshot Grain
Accumulating Snapshot Grain

Comparison of Fact Table Types


Characteristic

Transaction Grain

Periodic
Snapshot Grain

Accumulating
Snapshot Grain

Time period
represented

Point in time

Regular,
predictable
intervals

Indeterminate time
span, typically
short-lived

Grain

One row per


transaction event

One row per period One row per life

Fact table loads

Insert

Insert

Insert and update

Fact row updates

Not revisited

Not revisited

Revisited
whenever activity

Date dimensions

Transaction date

End of period date

Multiple dates for


standard
milestones

Facts

Transaction activity Performance for


predefined time
interval

Performance over
finite lifetime

What makes it Enterprise?


Conformed Dimensions & Facts
Common fields across the enterprise domains
Common definition across the enterprise domains

The Bus Architecture


Allows traversing across business processes
Promotes conformity

Conformed Dimensions / Bus Architecture

Dimensional Modeling Embellishments


Snowflaking

Bridges

Normalizing a dimension
table
OLTP modeler tendency

Many to many
relationships not resolved
in fact tables

Not optimal for query


performance

Sits between a dimension


and a fact

Outriggers
A dimension table is
referenced in another
dimension (i.e. hire date
example)

Ragged and variable


depth hierarchies

Snowflaking
What is Snowflaking?
Normalizing in a star
schema
Should be avoided

Adds complexity to
presentation layer

SQL is more complex


Adds burden to database optimizers
Very little space savings
Impacts Bitmap indexes*

Sometimes OK (Outriggers for low cardinality attributes)


*good for low cardinality fields

Snowflaking
What is Snowflaking?
Normalizing in a star
schema
Should be avoided

Adds complexity to
presentation layer

SQL is more complex


Adds burden to database optimizers
Very little space savings
Impacts Bitmap indexes*

Sometimes OK (Outriggers for low cardinality attributes)


*good for low cardinality fields

DW Tips: Dimensional Modeling Myths


Dimensional data warehouses
are appropriate for summary
level data only

Dimensional models
presuppose the business
questions and therefore are
inflexible
Dimensional models are
departmental
Brining a new data source into
a dimensional data warehouse
breaks existing schemas and
requires new fact tables

Kimball University White Paper

A good way to narrow the


scope and manage risk is to
focus on delivering the report
most often requested
Dimensional models are fully
de-normalized
Ralph Kimball invented the fact
and dimension terminology

DW Tips: 10 Essential Dim Mod Rules


Load detailed atomic data into
dimensional structures
Structure dimensional models
around business processes
Ensure every fact table has a
date dimension table
Ensure all facts in a Fact table
are the same grain
Resolve many-to-many
relationships in fact tables
Resolve many to one
relationships in dimension
tables

Kimball University Article, Margy Ross, InformationWeek

Store report lables and filter


domain values in dimension
tables

Dimension tables should use


surrogate keys
Create conformed dimensions
to integrate data across the
enterprise

Continuously balance
requirements and realities to
deliver a DW/BI solution thats
accepted by business users
and that supports their
decision making

Thank You
Future Webinars
The ETL Process
Stars in Motion
Columnar and In-memory
databases
Modeling Business Process

Retail Sales
Inventory
CRM
HR

Vous aimerez peut-être aussi