Académique Documents
Professionnel Documents
Culture Documents
to Data Warehouse
รศ.ดร. วรพจน์ กรีสุระเดช
Worapoj Kreesuradej, Ph.D.
Associate Professor
...
Wrapper Wrapper Wrapper
...
Source Source Source
Disadvantages of Query-
Driven Approach
Delay in query processing
Inefficient and potentially expensive
for frequent queries
Competes with local processing at
sources
The Warehousing Approach
Information Clients
integrated in
advance Data
Warehouse
Stored in wh
for direct
Integration System Metadata
querying and
analysis ...
Extractor/ Extractor/ Extractor/
Monitor Monitor Monitor
...
Source Source Source
Advantages of Warehousing
Approach
High query performance
Doesn’t interfere with local processing
at sources
Information copied at warehouse
Can modify, annotate, summarize,
restructure, etc.
Can store historical information
Security, no auditing
Characteristics of DW
Subject oriented Data are organized by how
users refer to it
Integrated Inconsistencies are removed
in both nomenclature and
conflicting information; (i.e.
data are ‘clean’)
Non-volatile Read-only data. Data do not
change over time.
Time variant Data are time series, not
current status
Subject Oriented
Data Warehouse is designed around
“subjects” rather than processes
A company may have
Retail Sales System
Outlet Sales System
Catalog Sales System
DW will have a Sales Subject Area
Subject Oriented
OLTP Systems
Data Warehouse
Sales Subject Area
for direct
querying and Integration System Metadata
analysis
...
Extractor/ Extractor/ Extractor/
Monitor Monitor Monitor
...
Source Source Source
Non-Volatile
Operational update of data does not occur
in the data warehouse environment.
Does not require transaction
processing, recovery, and concurrency
control mechanisms
Requires only two operations in data
accessing:
initial loading of data and access of
data.
Non-Volatile(Read-Mostly)
Write
USER
OLTP
Read
USER DW
Read
Time Variant
The time horizon for the data warehouse is
significantly longer than that of operational
systems.
Operational database: current value data.
Data warehouse data: provide information
from a historical perspective (e.g., past 5-
10 years)
Time Variant
analysis has a
time component
Source Staging DW
Systems Area Database
OLTP DW
Dimensional Modeling
Facts are stored in FACT Tables
Dimensions are stored in
DIMENSION tables
Dimension tables contains textual
descriptors of business
Fact and dimension tables form a
Star Schema
“BIG” fact table in center surrounded
by “SMALL” dimension tables
Star Schema
CUSTOMER
# CUSTOMER _KEY
* C ID
TIME * C NAME
# TIME_KEY referenced by * STATE
* ORD ERD ATE * C ITY
* D AY_ OF_ WEEK referenced by
* D AY_ NU MBER_IN_MONTH SALES reference
* D AY_ NU MBER_IN_YEAR # TIME_KEY
* WEEK_N UMBER # PRODUC T_KEY
* MON TH # CUSTOMER _KEY
* QUARTER reference
* PRIC E
* H OL IDAY_FL AG * QUANTITY
* FISC AL_ YEAR * SALES
* FISC AL_ QUARTER
reference
referenced by
PRODUCT
# PRODUC T_KEY
* PID
* PNAME
* PCN AME
Star Schema
Data mart
Data mart = subset of DW for community
users, e.g. accounting department
Sometimes exist as Multidimensional
Database
Info mart = summarized data + report for
community users
Meta Data
Data about data
Needed by both information technology
personnel and users
IT personnel need to know data sources and
targets; database, table and column names;
refresh schedules; data usage measures; etc.
Users need to know entity/attribute
definitions; reports/query tools available;
report distribution information; help desk
contact information, etc.
Information Delivery Tools
Tools
Query & reporting
OLAP
Data mining, visualization, segmentation,
clustering
New developments: text mining, web mining
& personalization
Mining multimedia data
Information Delivery Tools
Commercial tools
Crystal Report, Impromptu, WebFocus
MDB=Multidimensional databases
System Architectures