Académique Documents
Professionnel Documents
Culture Documents
Development
By:
Tushar Kant
Gupta
OVERVIEW ON DATAWAREHOUSE
Why Data Warehousing?
Which are our
lowest/highest margin
customers ?
Who are my customers
What is the most and what products
effective distribution are they buying?
channel?
Competitive advantage
Data Structure & Complex; suitable for Simple; suitable for business analysis
Format operational computation
Access Probability High Moderate to low
“Heterogeneities are
everywhere” Personal
Databases
World
Scientific Databases Wide
Web
Digital Libraries
♦ Different interfaces
♦ Different data representations
♦ Duplicate and inconsistent information
Goal to Access Information: Unified Access to Data
Integration System
World
Wide
Personal
Web
Digital Libraries Scientific Databases Databases
One,
T company-
wide
warehouse
E
T
E
T
Simpler data access
E
Single ETL for Dependent data marts
enterprise data warehouse loaded from EDW
(EDW)
ODS and data warehouse
Logical data mart and @ctive data warehouse are one and the same
T
E
Data marts are NOT separate databases,
Near real-time ETL for but logical views of the data warehouse
@active Data Warehouse Easier to create new data marts
DATA MODELING:
Data Modeling
Multidimensional Data Schema
Support
• Decision Support Data tends to be
• Nonnormalized
• Duplicated
• Preaggregated
Based on it, commonly used schemas
are
• Star Schema (Most common)
• Special Design technique for
multidimensional data representations
• Optimize data query operations instead of
data update operations
• Snowflake Schema
• Normalized form of star schema
Schema Components
Facts
• Numeric measurements (values) that
represent a specific business aspect or
activity
• Stored in a fact table at the center of the
star scheme
• Contains facts that are linked through
their dimensions
• Can be computed or derived at run time
• Updated periodically with data from
operational databases
Dimensions
• Qualifying characteristics that provide
additional perspectives to a given fact
Schema Components
Attributes
• Dimension Tables contain Attributes
• Attributes are used to search, filter, or classify
facts
• Dimensions provide descriptive characteristics
about the facts through their attributed
• Must define common business attributes that will
be used to narrow a search, group information, or
describe dimensions. (ex.: Time / Location /
Product)
• No mathematical limit to the number of dimensions
(3-D makes it easy to model)
Attribute Hierarchies
• Provides a Top-Down data organization
• Aggregation
• Drill-down / Roll-Up data analysis
• Attributes from different dimensions can be
Star Schema
Dimension
Tables
Fact Table
Star Schema Representation
Fact and Dimensions are represented
by physical tables in the data
warehouse database
Fact tables are related to each
dimension table in a Many to One
relationship (Primary/Foreign Key
Relationships)
Fact Table is related to many
dimension tables
• The primary key of the fact table is a
composite primary key from the dimension
tables
Each fact table is designed to answer
a specific DSS question
Strengths of the Dimensional Model
the dimensional model is a predictable,
standard framework.
withstands unexpected changes in user
behavior
extensible to accommodate unexpected new
data elements and new design decisions
no query tool or reporting tool needs to be
reprogrammed to accommodate the change
there is a body of standard approaches for
handling common modeling situations in the
business enterprise
availability of a huge body of administrative
utilities and software processes that
ETL Process:
Data Reconciliation
Typical operational data is:
• Transient – not historical
• Not normalized (perhaps due to
denormalization for performance)
• Restricted in scope – not comprehensive
• Sometimes poor quality – inconsistencies and
errors
After ETL, data should be:
• Detailed – not summarized yet
• Historical – periodic
• Normalized – 3rd normal form or higher
• Comprehensive – enterprise-wide perspective
• Quality controlled – accurate with full
integrity
The ETL Process
Capture
Scrub or data cleansing
Transform
Load and Index
Data flow
Record-level: Field-level:
Selection – data partitioning single-field – from one field to one field
Joining – data combining multi-field – from many fields to one, or
Aggregation – data summarization one field to many
Steps in data reconciliation (continued)
Operational
Warehouse Manager
data source1
Meta-flow
Meta-data High
summarized data
Inflow Outflow
Lightly
Load summarized
Manager data Query Manager OLAP (online
Upflow analytical processing)
Operational tools
data source n Detailed data DBMS
Operational
data store (ods)
Warehouse Manager
Downflow