Académique Documents
Professionnel Documents
Culture Documents
MINING (SE-409)
Lecture-5 and 6
Huma Ayub
Software Engineering department
Ahsan Abdullah 2
Why ROLAP?
Issue of scalability i.e. curse of dimensionality for
MOLAP
Ahsan Abdullah 3
ROLAP as a “Cube”
• OLAP data is stored in a relational database (e.g. a star
schema)
• The fact table is a way of visualizing as a “un-rolled” cube.
Fact Table
Product
Month Product Zone Sale K Rs.
M1 P1 Z1 250
M2 P2 Z1 500
Time
Ahsan Abdullah 4
How to create “Cube” in ROLAP
• Cube is a logical entity containing values of a certain fact
at a certain aggregation level at an intersection of a
combination of dimensions.
P1
P2
P3
Total
Ahsan Abdullah 5
How to create “Cube” in ROLAP using SQL
• For the table entries, without the totals
SELECT S.Month_Id, S.Product_Id,
SUM(S.Sales_Amt)
FROM Sales
GROUP BY S.Month_Id, S.Product_Id;
Ahsan Abdullah 6
Problem With Simple Approach
• Number of required queries increases exponentially with
the increase in number of dimensions.
– In the example, the first query can do most of the work of the
other two queries.
Ahsan Abdullah 7
CUBE Clause
Ahsan Abdullah 8
ROLAP & Space Requirement
If one is not careful in aggregation , with the increase in
number of dimensions, the number of summary tables
gets very large
Ahsan Abdullah 9
EXAMPLE: ROLAP & Space Requirement
A naïve implementation will require all combinations of summary
tables at each and every aggregation level.
…
24 summary tables, add in
geography, results in 120 tables
10
Ahsan Abdullah
HOLAP
Ahsan Abdullah 11
DOLAP
Cube on the
remote server
Local Machine/Server
Ahsan Abdullah 12
Dimensional Modeling (DM)
13
The need for ER modeling?
• Problems with early COBOLian data processing
systems.
• Collection of data
• Data redundancies
14
Why ER Modeling has been so successful?
– Coupled with normalization drives out all the
redundancy out of the database.
15
Need for DM: Un-answered Qs
• Lets have a look at a typical ER data model first.
• Some Observations:
– All tables look-alike, as a consequence it is difficult to identify:
3 2 5
2 5 4
• Too complex for queries that span multiple tables with a large
number of records
19
ER vs. DM
ER DM
Constituted to optimize OLTP Constituted to optimize DSS
performance. query performance.
• Bring it to DSS
• Two general methods:
– De-Normalization
21
What is DM?…
• A simpler logical model optimized for decision support.
• Inherently dimensional in nature[fact + dimension] , with a single
central fact table and a set of smaller dimensional tables.
• Multi-part key for the fact table (long in terms of data, contain
numerical data, how many item sale, what revenue we get from
sale+ how much sale we need + single column primary key).
22
What is DM?...
23
Dimensions have Hierarchies
Items
Books Cloths
Engg Medical
24
The two Schemas
Star
Snow-flake
25
“Simplified” 3NF (Retail)
CITY DISTRICT M DIVISION PROVINCE
1 district BACK
1 1
zone M division
M DISTRICT DIVISION
ZONE CITY
1
store M week
1
STORE # STREET ZONE ... DATE WEEK
1 M
sale_header quarter
M M
RECEIPT # STORE # DATE ... MONTH QTR
1 1
M M
1
WEEK MONTH
M sale_detail month 1
RECEIPT # ITEM # ... $
YEAR QTR
1 M M
1 year
ITEM # CATEGORY
ITEM # SUPPLIER
item_x_cat M
1 item_x_splir
CATEGORY DEPT
cat_x_dept 26
Vastly Simplified Star Schema
Product Dim
Geography Dim
1 ITEM#
STORE# 1
Fact Table CATEGORY
ZONE
RECEIPT#
DEPT
CITY
STORE#
M SUPPLIER
DISTRICT
ITEM# M
DIVISION
DATE Time Dim
M
PROVINCE . DATE
. 1
facts . WEEK
QUARTER
YEAR
27
The Benefit of Simplicity
28
Features of Star Schema
Dimensional hierarchies are collapsed into a single table for
each dimension. Loss of Information? Relationship lost
30