Académique Documents
Professionnel Documents
Culture Documents
Operational Systems
Data Marts
Dimensional Modeling
Data warehousing:
The process of constructing and using data warehouses.
3
Evolution of Data Warehousing
4
Benefits of a Data Warehouse
High ROI
Increased Productivity
High implementation cost (anywhere
Data Warehousing improves the
from $50,000 to over $10 million due to productivity of corporate decision-makers
a variety of solutions available) by creating and integrated database of
Avg. 3 yr return (ROI) reached 401% consistent, subject oriented , historical data.
DWH
(study conducted by International Data BENEFITS It integrates data from multiple
Corporation (IDC) in 1996 incompatible systems into a form that
provides one consistent view of the
90% of companies achieved organization.
over 40% ROI. By transforming data into meaningful
Half of the companies information, a data warehouse allows
achieved over 160% ROI. corporate decision makers to perform more
substantive, accurate and consistent
A quarter achieved more than Competitive Advantage
analysis.
600% ROI. Huge returns of investment for companies
who have successfully implemented a data
warehouse is evidence of the enormous
competitive advantage that accompanies this
technology.
Competitive advantage is gained by allowing
decision-makers access to data that can reveal
previously unavailable, unknown and untapped
information on, for example, customers, trends
and demands.
5
Data Warehouse Definition (as defined by the Data Warehouse gurus)
6
Characteristics of a Data Warehouse
Subject Oriented Integrated
Organized around major subjects, such as customer, Integrating of multiple, heterogeneous data sources
product, sales. relational databases, flat files, on-line transaction
records.
Focusing on the modeling and analysis of data for
Data cleaning and data integration techniques are
decision makers, not on daily operations or applied.
transaction processing. Ensure consistency in naming conventions, encoding
Provide a simple and concise view around particular structures, attribute measures, etc. among different
subject issues by excluding data that are not useful in data sources. e.g., Hotel price: currency, tax,
the decision support process. breakfast covered, etc.
When data is moved to the warehouse, it is
converted.
Data warehouse data: provide information Does not require transaction processing,
from a historical perspective (e.g., past 5-10 recovery, and concurrency control mechanisms
years) Requires only two operations in data
accessing: initial loading of data and access of
data.
7
Applications of Data Warehouse
Three kinds of data warehouse applications
Information processing
supports querying, basic statistical analysis, and reporting
using cross-tabs, tables, charts and graphs
Analytical processing
multidimensional analysis of data warehouse data
supports basic OLAP operations, slice-dice, drilling, pivoting
Data mining
knowledge discovery from hidden patterns
supports associations, constructing analytical models,
performing classification and prediction, and presenting the
mining results using visualization tools.
8
Agenda
Operational Systems
Data Marts
Dimensional Modeling
10
Operational Systems (OLTP) Vs Data Warehouse (OLAP)
Operational Systems Data Warehouse Systems
11
Agenda
Operational Systems
Data Marts
Dimensional Modeling
12
Why a separate Warehouse ?
Performance Functions
Special data organisation, access Missing Data: Decision support
methods, and implementation methods requires historical data which
are needed to support operational DBs do not typically
multidimensional views and operations maintain
typical of OLAP Data Consolidation: DS requires
Complex OLAP queries degrade consolidation (aggregation,
performance for operational summarization) of data from
transactions heterogeneous sources: operational
Concurrency control and recovery DBs, external sources
modes of OLTP are not compatible Data Quality: different sources typically
with OLAP analysis use inconsistent data representations,
codes and formats which have to be
reconciled.
13
Advantages of a Data Warehouse
14
Agenda
Operational Systems
Data Marts
Dimensional Modeling
16
Data Warehouse Architectural Components
Operational Data:
Source of Data for the Data Warehouse.
Load Manager:
Performs all the operations associated with the extraction and
loading of data into the warehouse.
17
Data Warehouse Architectural Components
Warehouse Manager:
18
Data Warehouse Architectural Components
Query Manager
Performs all operations associated with the management of user
queries.
Examples,
Directing queries to the appropriate tables.
Scheduling the execution of queries.
Detailed Data
The area that stores all the detailed data in the database schema.
Not available online but made available by aggregating data to the
next level of detail.
On a regular basis, detailed data is added to the warehouse to
supplement the aggregated data.
19
Data Warehouse Architectural Components
Stores all the predefined lightly and highly defined summarized data
generated by the warehouse manager.
This area of warehouse is transient in order to respond to changing
query profiles.
Used to speed up performance of queries.
Continuously updated as new data is loaded into the warehouse.
20
Data Warehouse Architectural Components
Archive/Backup Data
Area of Data warehouse stores detailed and summarized data for
the purposes of archiving and backup.
Data is transferred to storage devices such as magnetic tape or
optical disk.
Metadata
Area stores all the metadata (data about data) definitions used by
all the processes in the warehouse.
Used for a variety of purposes
Extraction and loading processes.
Warehouse Management process.
As part of the query management process.
21
Data Warehouse Architectural Components
22
Agenda
Operational Systems
Data Marts
Dimensional Modeling
24
Data Marts - Advantages
25
Data Marts - Disadvantages
Complex maintenance
Scalability issues
26
Data Warehouse Vs Data Marts
27
Agenda
Operational Systems
Data Marts
Dimensional Modeling
29
Agenda
Operational Systems
Data Marts
Dimensional Modeling
Top Down
Bottom - UP
31
Agenda
Operational Systems
Data Marts
Dimensional Modeling
33
Dimensional Modeling
34
Features of Dimensional Modeling
Efficiency: The consistency of the underlying database structure allows
more efficient access to the data by various tools like report writers and
query tools
35
Dimensional Modeling Techniques Star Schema
Star Schema: A single object (fact table) in the middle connected to a
number of dimension tables
sale
orderId
date customer
product
custId custId
prodId
prodId name
name
storeId address
price
qty city
amt
store
storeId
city
36
Dimensional Modeling Techniques Snowflake Schema
Snowflake Schema: A refinement of star schema where the dimensional
hierarchy is represented explicitly by normalizing the dimension tables
Operational Systems
Data Marts
Dimensional Modeling
39
Step1 : Choosing the process
40
Step2 : Choosing the grain
41
Step3 : Identifying the conforming dimensions
Dimensions set the context for formulating queries about the facts in
the fact table
We identify dimensions in sufficient detail to describe things such as
clients and properties at the correct grain
If any dimension occurs in two data marts, they must be exactly the
same dimension, or one must be a subset of the other (this is the only
way that two Data Marts share one or more dimensions in the same
application)
When a dimension is used in more than one Data Mart, the dimension
is referred to as being conformed.
42
Step 4 : Choosing the facts
The grain of the fact table determines which facts can be used in the
data mart all facts must be expressed at the level implied by the grain
In other words, if the grain of the fact table is an individual property
sale, then all the numerical facts must refer to this particular sale (the
facts should be numeric and additive)
43
FACT Table Types Additive FACT Tables
GEOGRAPHY_DIM TIME_DIM
GEOGRAPHY_KEY DATE_KEY
SALES_FACT
GEOGRAPHY_KEY
PRODUCT_KEY
DATE_KEY
PRICE
44
FACT Table Types Semi-Additive FACT Tables
GEOGRAPHY_DIM TIME_DIM
GEOGRAPHY_KEY DATE_KEY
INVENTORY_BALANCE
GEOGRAPHY_KEY
PRODUCT_KEY
DATE_KEY
45
FACT Table Types Factless FACT Table
GEOGRAPHY_DIM TIME_DIM
GEOGRAPHY_KEY DATE_KEY
PRODUCT_SALES_FACT
GEOGRAPHY_KEY
PRODUCT_KEY
DATE_KEY
46
Step 5 : Storing pre-calculations in FACT tables
47
Step 6 : Rounding out the dimension tables
48
Step 7 : Choosing the duration of database
49
Step 8 : Tracking slowly changing dimensions
50
Three types of Slowly Changing Dimensions (SCD)
51
Step 9 : Deciding the query priorities and query modes
52
Agenda
Operational Systems
Data Marts
Dimensional Modeling
54
OLAP Server Architectures
55
Typical OLAPS Operations
Pivot (rotate):
Reorient the cube, visualization, 3D to series of 2D planes.
56
Multi-Dimensional Analysis
Sales volume as a function of product, month, and region
Country
Mexico
sum
57
Benefits of OLAP
58