Vous êtes sur la page 1sur 60

Data Warehouse Concepts

Chapter 1
Data,Information,Knowledge,Decisi
on
Analysis
Report
Chapter2
Normalization
OLTP Systems
Characteristics of OLTP
Chapter 3
Data Warehouse
Advantages of DataWarehouse
Goals of Data Warehouse

DWH-Training Material

Chapter 4
Characteristics of Data Warehouse
Difference between OLTP/DW
OLAP
Data Warehouse/Data Mart
Data Warehouse Strategies
Chapter 5

Dimension Modeling
Star Schema
Snow Flake Schema
Dimension Table
Conformed Dimension
Degenerated Dimension

Chapter 6
Fact Table
Types of Fact
Metadata Management

DWH-Training Material

Chapter 7

Grain Level
Surrogate Key
Time Dimension
Staging Area
Slowly Changing Dimensions

Chapter 8
Project Overview
Phases of Project

DWH-Training Material

Data >> Decision

Data
Raw Observations
No Meaning

Informatio
n
Meaning by

Knowledge
-Appropriate collection
of information
-Intent is to be useful
and to change the busin
process

Relational
Connection

DWH-Training Material

What is Knowledge?
Data

Information Knowledge

Raw Facts

Data in context

Numbers

Readily Captured
Strategic Value

Acti
on

Information+Experience

Knowledge

applied

DWH-Training Material

to decision making

Analysis

Comparison of Sales (Fact) of a product (dimension) over Years(dimension)


in the same region(dimension).
What is the total sales value(fact) of a particular product(dimension) in a
store(dimension), in 3-months(dimension)?
What is the amount spent(fact) for a particular product
promotion(dimension) in a particular branch(dimension), in a particular
city(dimension),in a year(dimension)?

DWH-Training Material

Report:

Collection of Data

Purpose:
Analysis- Comparitive Study of
Data, Historical Data
Final:

Improve Decision

DWH-Training Material

Chapter 2

DWH-Training Material

Normalization

Normalization is the process od efficiently organizing data in a


database.There are two goals of the normalization process::

Eliminating redundant data


Ensuring data dependencies

First Normal Form


First normal form (1NF) sets the very basic rules for an organized
database
Eliminate duplicate columns from the same table
Create separate tables for each group of related data and identify
each row with a unique column or set of columns( the primary
key)

DWH-Training Material

10

Second Normal Form


Second Normal Form(2NF) further addresses the concept of
removing duplicative data
Meet all the requirements of the first normal form.
Create relationships between these new tables and their
predecessors through the use of foreign keys.
Third Normal Form
Third Normal Form(3NF) remove columns which are not dependent
upon the primary key.
Meet all the requirements of the second normal form
Remove columns that are not dependent upon the primary key.

DWH-Training Material

11

Information System/OLTP
Systems

OLTP systems- Highly Normalized databases


Purpose of OLTP systems is to capture data
Do DML activities
Purpose of Data Warehouse is for multidimensional analysis
OLTP applications like Equity Plans,Shares,Insurance,Loans,Savings

DWH-Training Material

12

Characteristics-OLTP
Characteristics

OLTP

Operation

Insert/Update

Analytical Requirements

Low

Data per Transaction

Small

Data Level

Detailed

Orientation

Records

DWH-Training Material

13

Business Intelligence

From an information systems standpoint, BI provides users with online


analytical processing or data analysis capabilities to predict trends,
evaluate business questions and so on

From a business analyst viewpoint, it is the process of gathering high


quality,meaningful information about a subject, which enables the analyst
to draw conclusions

DWH-Training Material

14

Chapter 3

DWH-Training Material

15

Data Warehouse
Data warehousing is the entire
process of data extraction,
transformation and loading of data to
the warehouse and the access of the
data by end users and applications.

DWH-Training Material

16

Data Warehouse Architecture


DWH-Training Material

17

Advantages through DW

Acquire new customers


Retain Existing customers
Improve customer satisfaction
Sell more products

DWH-Training Material

18

Goals of Data Warehouse


Easy access to organization information
Data Warehouse must be adaptive and resilent to
change
Secure environment to protect information assets.
Foundation for improved decision making,

DWH-Training Material

19

Chapter 4

DWH-Training Material

20

Data Warehouse
Characteristics

Subject- Oriented
Integrated
Non-Volatile
Time-Variant

DWH-Training Material

21

Difference- OLTP and DW


They are both databases
They both hold data
But, they have been designed for different scopes:
Running the business (OLTP Systems) v/s managing the
business(DWH):

Operational systems focus on present data.


DWHs focus on historical data(present,past)
OLTP systems are optimized to insert/update and store data
DWH are optimized to select/analyze data.

DWH-Training Material

22

OLTP v/s Data Warehouse


OLTP

OLAP(DW)

Access

Read/Write

Read Lots of scan

Unit of Work

Short, Simple
Transaction

Query

# Users

Thousands

Hundreds

DB Size

100 MB-GB

100 GB - Terabytes

Function

Date of Date
Operations

Decision Support

DB Design

Application Oriented

Subject Oriented

Data

Current, Up to date
detailed

Historical,
Summarized

DWH-Training Material

23

OLAP

OLAP is an acronym forOnline Analytical Processing.


OLAP performs multidimensional analysis of business data
and provides the capability for complex calculations, trend
analysis. OLAP enables end-users to perform ad hoc
analysis of data in multiple dimensions, thereby providing
the insight and understanding they need for better decision
making.

OLAP operations
Roll-up
Drill-down
Slice and dice
Pivot (rotate)
DWH-Training Material

24

Data Mart Data


Warehouse

A Data Mart stores data for a limited number of subject


areas, such as marketing or sales data.

A Data warehouse deals with multiple subject areas and is


typically implemented and controlled by a central
organization unit such as the corporate information factory.
It is often called a central or enterprise data warehouse.

DWH-Training Material

25

Data Warehouse / Data


Marts
Property

Data Warehouse

Data Mart

Scope

Enterprise

Department

Subjects

Multiple

Single

Data Source

Many

Few

Implementation time

Months to Years

Months

DWH-Training Material

26

Data Warehousing
Strategies

Enterprise wide warehouse, top down, the Inmon methodology

Data mart, Bottom up, the Kimball methodology

When properly executed , both result in an enterprise-wide data


warehouse, but with different architectures

DWH-Training Material

27

Top Down Approach


External
Data
Marketing

Data Warehouse
Marketing
Operation
SalesData Marts
al
Finance

Sales

Systems

Finance

DWH-Training Material

28

Bottom Up Approach
Legacy
Data

Data Marts
Warehouse

Data

Marketin
g

Operations
Data

Sales

External
data
sources

Finance

DWH-Training Material

Marketing
Sales
Finance

29

Chapter 5 and Chapter 6

DWH-Training Material

30

Data Warehouse
Architecture

DWH-Training Material

31

Dimensional Modeling

Dimensional Modeling provides users the ability to view


data based on the organization of the business and the
important characteristics of the data

There are two major components of dimensional analysis:


Dimensions, which determine how data will be
presented; and
Facts which determine what data will be presented.

DWH-Training Material

32

Dimension Table Examples

Retail store name, zip code, product name, product


category, day of the week
Telecommunication call origin, call destination
Banking customer name, account number, branch,
account officer
Insurance Policy type, insured party

DWH-Training Material

33

Dimension Table
Characteristics
Dimension tables have the following characteristics:
Contain textual information that represents the attributes of
the business
Contain relatively static data
Are joined to a fact through foreign key reference
They are hierarchical in nature and provide the ability to
view data at varying levels of details.

DWH-Training Material

34

Fact Table Examples

Retail -- number of units sold, sales amount

Telecommunications -- length of the call in minutes, average number of


calls

Banking -- average monthly balance

Insurance claims amount

DWH-Training Material

35

Fact Table Characteristics

Fact table have the following characteristics


Contain numerical metrics of the business
Can hold large volumes of data
Can grow quickly
Are joined to dimension table through foreign keys that
reference primary keys in the dimension tables

DWH-Training Material

36

Star Schema

DWH-Training Material

37

Snowflake Schema

DWH-Training Material

38

Conformed Dimensions

An dimension Table which is shared across data marts or more


than 1 Fact table
Example:
Calendar/Date/Time Dimension
Customer Dimension
Product Dimension

DWH-Training Material

39

Degenerated Dimension

Degenerative dimension is something dimensional in nature


but exist in fact table

DWH-Training Material

40

Fact Tables

Types of Measures
Additive facts
Non-additive facts
Semi-additive facts

DWH-Training Material

41

Fact Tables

Additive Facts
Additive facts are facts that can be summed up through all of
the dimensions in the fact table.
Example :Dollar value is additive fact. If we want to find out the
amount for a particular place for a particular period of time, we
can add the dollar amounts and come up with total amount.

DWH-Training Material

42

Non- Additive Facts


Non-additive facts are facts that cannot be summed up for any of
the dimensions present in the fact table.
Example: Measure height for citizens by geographical location,
when we rollup citydata to state level data we should not add
heights of the citizens rather we may want to use it ti derive count
Example: percentage(%)

DWH-Training Material

43

Semi-additive facts
Semi-additive facts are facts that can be summed up for
some of the dimensions in the fact table, but not the others.

DWH-Training Material

44

Factless Fact Table


A factless fact table is a fact table that does not
have any measures.
Location
Dimesnion
Location_PK

Teacher
Dimension
Teacher_PK

Course_Dimension
Course_PK

Teacher_FK
Course_FK
Student_FK
Location_FK

DWH-Training Material

Student_Dimensio
n
Student_PK

45

Metadata

Its data bout data


Vital to the warehouse
Used by everyone
The key to understanding warehouse information

DWH-Training Material

46

Chapter 7

DWH-Training Material

47

Grain Level
Level at which the data has to be captured in the
Fact table
Example
Each Sales Transaction
Insurance claim Transaction
Monthly Account

DWH-Training Material

48

Surrogate Keys

It has no meaning, other than stating uniqueness for each


record stored in the fact table i.e to implement primary
keys of almost all dimension tables
It is just a sequence no.
Advantages of surrogate key include
Control over data
Avoid using the OLTP keys as data warehouse keys

DWH-Training Material

49

Data Staging

Often used as an interim step between data extraction and


later steps
No end user access to staging

Source

Staging

DWH-Training Material

Target

50

Slowly Changing
Dimensions(SCD)
Slowly changing dimension change gradually and
occasionally over time.
Example: Employee change their address, name,
marital status

DWH-Training Material

51

SCD

Approach

Results

Type1

Overwriting the old values


in the dimension record

Only
current

Losing the
ability to track
the old history

Type2

Creating an additional
dimension record(with a
time stamp)at the time of
the change with the new
attribute values

History+
Current

Segmenting
history very
accurately
between the old
description and
the new
description

Type3

Creating new current fields Previous +


and move the old attribute
Current
in a precedent field

DWH-Training Material

Describe both
historical and
current view

52

Project Manager
Business Analyst
Architect
ETL Lead
SourceSystem Study
Data Modeler

OLAPLead
ETL Devs/Cons

OLAP Devs/Cons
DBA
Test Lead
Tester

DWH-Training Material

53

Phase1 - Define

Phase2- Analysis

Phase3 - Design

Phase4-Build

Phase5-Test

Phase6-Production

Phases of Project

DWH-Training Material

54

The Define Phase

DWH-Training Material

55

The Analysis Phase

DWH-Training Material

56

The Design Phase

DWH-Training Material

57

The Build Phase

DWH-Training Material

58

The Test Phase

DWH-Training Material

59

Transition to Production
Phase

DWH-Training Material

60

Vous aimerez peut-être aussi