Vous êtes sur la page 1sur 45

 Review

 Today
Architecture
 Star Schema
 Fact table
Characteristics
 Dimensions
Relational OLAP  Drilling Down &
Multidimensional OLAP Roll up
ROLAP VS. MOLAP  Slicing & Dicing
 Implementation
techniques for
OLAP
 Bit map indexes
 Join indexes
 File org.
10/15/08 Sudarshan 1
What is Star
Schema?
 Star Schema is a relational database schema for
representing multidimensional data.
 It is the simplest form of data warehouse
schema that contains one or more dimensions
and fact tables.
 It is called a star schema because the entity-
relationship diagram between dimensions and
fact tables resembles a star where one fact table
is connected to multiple dimensions.
 The center of the star schema consists of a
large fact table and it points towards the
dimension tables.
 The advantage of star schema are slicing down,
performance
10/15/08 increase and easy understanding of2
Sudarshan
Steps in designing Star
Schema
 Identify a business process for
analysis(like sales).
 Identify measures or facts (sales
dollar).
 Identify dimensions for facts(product
dimension, location dimension, time
dimension, organization dimension).
 List the columns that describe each
dimension.(region name, branch name,
region name).
 Determine the lowest level of summary
in a fact table(sales dollar).
10/15/08 Sudarshan 3
Important aspects of Star Schema
& Snow Flake Schema
 In a star schema every dimension will have a
primary key.
 In a star schema, a dimension table will not have
any parent table.
 Whereas in a snow flake schema, a dimension
table will have one or more parent tables.
 Hierarchies for the dimensions are stored in the
dimensional table itself in star schema.
 Whereas hierarchies are broken into separate
tables in snow flake schema. These hierarchies
helps to drill down the data from topmost
hierarchies to the lowermost hierarchies.
10/15/08 Sudarshan 4
Fact
 Facts are numeric measurements
(values) that represent a specific
business activity.
 Example, sales figures are numeric
measurements that represent product
and/or service sales.
 Facts are used in business data analysis,
are units, cost, prices and revenues.
 Facts are stored in a FACT table I.e. the
center of the star schema.
10/15/08 Sudarshan 5
Fact Table
The centralized table in a star schema is called
as FACT table, that contains facts and connected to
dimensions. A fact table typically has two types of
columns:
 those that contain facts and
 those that are foreign keys to dimension tables.
 The primary key of a fact table is usually a composite
key that is made up of all of its foreign keys.
 A fact table might contain either detail level facts or
facts that have been aggregated (fact tables that
contain aggregated facts are often instead
called summary tables). A fact table usually
contains facts with the same level of aggregation.
10/15/08 Sudarshan 6
 Many OLAP applications are based on a fact
table

 For example, a supermarket application might


be based on a table
Sales (Market_Id, Product_Id, Time_Id, Sales_Amt)

 The table can be viewed as multidimensional


 Market_Id, Product_Id, Time_Id are the dimensions
that represent specific supermarkets, products,
and time intervals
 Sales_Amt is a function of the other three

10/15/08 Sudarshan 7
Fact Table (Conclusion)
 Central table
 mostly raw numeric items
 narrow rows, a few columns at most
 large number of rows (millions to a
billion)
 Access via dimensions

10/15/08 Sudarshan 8
Dimension

 Qualifying characteristics that provide


additional perspective to a given fact.
 Example, sales might be compared by product
from region to region and from one time period
to the next.
 Here sales have product, location and time
dimensions.
 Such dimensions are stored in DIMENSIONAL
TABLE.

10/15/08 Sudarshan 9
Dimension Tables
 The dimensions of the fact table are further
described with dimension tables
 Fact table:
Sales (Market_id, Product_Id, Time_Id,
Sales_Amt)
 Dimension Tables:
Market (Market_Id, City, State, Region)
Product (Product_Id, Name, Category, Price)
Time (Time_Id, Week, Month, Quarter)

10/15/08 Sudarshan 10
Attributes
 Each dimension table contain
attributes.
 Used to search, filter and classify facts.
 Example, Sales, we can identify some
attributes for each dimension:
 Product Dimension: product ID,
description, product type
 Location Dimension: region, state, city.
 Time Dimension: year quarter, month,
week and date.

10/15/08 Sudarshan 11
Attributes hierarchy
 AH provides a top-down data
organization
 Used for aggregation and drill-
down/roll-up data analysis.
 Example, location dimension attributes
can be organized in a hierarchy by
region, state and city.
 AH provides the capability to perform
drill-down and roll-up searches.
 Allows the DW and OLAP systems to to
have defined path.
10/15/08 Sudarshan 12
(location)

all all

region Europe ... North_America

country Germany ... Spain Canada ... Mexico

city Frankfurt ... Vancouver ... Toronto

office L. Chan ... M. Wind

10/15/08 Sudarshan 13
Multidimensional Data

 Sales volume as a function of product,


month, and region Dimensions: Product, Location, Time
Hierarchical summarization paths
o n
gi

Industry Region Year


Re

Category Country Quarter


Product

Product City Month Week

Office Day

Month
10/15/08 Sudarshan 14
A Sample Data Cube

Total annual sales


Date of TV in U.S.A.
1Qtr 2Qtr 3Qtr 4Qtr sum
ct

TV
du

PC U.S.A
o
Pr

VCR

Country
sum
Canada

Mexico

sum

10/15/08 Sudarshan 15
Star Schema
 A single fact table and for each dimension
one dimension table
 Does not capture hierarchies directly

date, custno, prodno, cityname, ... p


T
i r
m o
e f d
a
c c c
u t i
s t
t y
10/15/08 Sudarshan 16
 Example of Star Schema: Figure 1.6

10/15/08 Sudarshan 17
 In the example, sales fact table is connected to
dimensions location, product, time and
organization. It shows that data can be sliced
across all dimensions and again it is possible
for the data to be aggregated across multiple
dimensions. "Sales dollar" in sales fact table
can be calculated across all dimensions
independently or in a combined manner which
is explained below.

Sales dollar value for a particular product


 Sales dollar value for a product in a location
 Sales dollar value for a product in a year
within a
10/15/08 Sudarshan 18
Example of Star Schema
time
time_key item
day item_key
day_of_the_week Sales Fact Table item_name
month brand
quarter time_key type
year supplier_type
item_key
branch_key
branch location
location_key
branch_key location_key
branch_name units_sold street
branch_type city
dollars_sold province_or_street
country
avg_sales
Measures

10/15/08 Sudarshan 19
Aggregation
 Many OLAP queries involve aggregation of the
data in the fact table

 For example, to find the total sales (over time)


of each product in each market, we might use

SELECT S.Market_Id, S.Product_Id, SUM


(S.Sales_Amt)
FROM Sales S
GROUP BY S.Market_Id, S.Product_Id

 The aggregation is over the entire time


dimension and thus produces a two-
dimensional view of the data
10/15/08 Sudarshan 20
Time
 The output of the previous query
Market_Id
M1 M2 M3 M4
SUM(Sales_Amt)
Product_Id

P1 3003 1503 …
P2 6003 2402 …
P3 4503 3 …
P4 7503 7000 …
P5 … … …

10/15/08 Sudarshan 21
Typical OLAP
Operations
 Roll up (drill-up): summarize data
 by climbing up hierarchy or by dimension reduction
 Drill down (roll down): reverse of roll-up
 from higher level summary to lower level summary or
detailed data, or introducing new dimensions
 Slice and dice:
 project and select
 Pivot (rotate):
 reorient the cube, visualization, 3D to series of 2D
planes.
 Other operations
 drill across: involving (across) more than one fact
table
 drill through: through the bottom level of the cube to
its back-end relational tables (using SQL)
10/15/08 Sudarshan 22
Drilling Down and
Rolling Up
 Some dimension tables form an aggregation
hierarchy
Market_Id → City → State → Region
 Executing a series of queries that moves down
a hierarchy (e.g., from aggregation over
regions to that over states) is called drilling
down
 Requires the use of the fact table or
information more specific than the requested
aggregation (e.g., cities)
 Executing a series of queries that moves up
the hierarchy (e.g., from states to regions) is
called
10/15/08rolling up Sudarshan 23
Drilling Down
 Drilling down on market: from Region to
State
Sales (Market_Id, Product_Id, Time_Id,
Sales_Amt)
Market (Market_Id, City, State, Region)

– SELECT S.Product_Id, M.Region, SUM


(S.Sales_Amt)
FROM Sales S, Market M
WHERE M.Market_Id = S.Market_Id
GROUP BY S.Product_Id, M.Region

– SELECT S.Product_Id, M.State, SUM


(S.Sales_Amt)
FROM Sales S, Market M
10/15/08 Sudarshan 24
Rolling Up
 Rolling up on market, from State to Region
 If we have already created a table, State_Sales,
State_Sales using

1. SELECT S.Product_Id, M.State, SUM


(S.Sales_Amt)
FROM Sales S, Market M
WHERE M.Market_Id = S.Market_Id
GROUP BY S.Product_Id, M.State
then we can roll up from there to:

2. SELECT T.Product_Id, M.Region, SUM


(T.Sales_Amt)
FROM State_Sales T, Market M
WHERE M.State = T.State
GROUP BY T.Product_Id, M.Region
10/15/08 Sudarshan 25
Roll-up and Drill Down

Higher Level of
Aggregation

Drill-Down
 Sales Channel
 Region
Roll Up

 Country
 State
 Location Address
 Sales
RepresentativeLow-level
Details
10/15/08 Sudarshan 26
“Slicing and Dicing”

The Telecomm Slice


Product

Household

Telecomm n s
i o
e g
Video R Europe
Far East
Audio India

Retail Direct Special Sales Channel


10/15/08 Sudarshan 27
Snowflake Schema
 A snowflake schema is a term that
describes a star schema structure
normalized through the use of outrigger
tables. i.e dimension table hierarchies
are broken into simpler tables. In
star schema example we had 4
dimensions like location, product, time,
organization and a fact table (sales)

10/15/08 Sudarshan 28
Snowflake schema
 Represent dimensional hierarchy
directly by normalizing tables.
 Easy to maintain and saves storage

T date, custno, prodno, cityname, ... p


i r
m o
e f d
a
c c c r
u t e
i
g
s t i
t y o
10/15/08 Sudarshan n29
Example of Snowflake Schema

10/15/08 Sudarshan 30
Schema

time
time_key item
day item_key supplier
day_of_the_week Sales Fact Table item_name supplier_key
month brand supplier_type
quarter time_key type
year item_key supplier_key

branch_key
location
branch location_key
location_key
branch_key
units_sold street
branch_name
city_key city
branch_type
dollars_sold
city_key
avg_sales city
province_or_street
Measures country

10/15/08 Sudarshan 31
Indexing Techniques
 Exploiting indexes to reduce
scanning of data is of crucial
importance
 Bitmap Indexes
 Join Indexes
 Other Issues
 Text indexing
 Parallelizing and sequencing of
index builds and incremental
updates

10/15/08 Sudarshan 32
Indexing Techniques
 Bitmap index:
 Index on a particular column
Each value in the column has a bit
vector: bit-op is fast
The length of the bit vector: # of
records in the base table
The i-th bit is set if the i-th row of the
base table has the value for the indexed
column
not suitable for high cardinality
domains
10/15/08 Sudarshan 33
BitMap Indexes
 Example: the attribute sex has values M and F.
A table of 100 million people needs 2 lists of
100 million bits

10/15/08 Sudarshan 34
Bitmap Index

0 1 0
M Y
1 1 1
F Y
1 0 0
F N
0 0 0
M N
1 1 1
F Y
1 0 0
F N

Customer Query : select * from customer where


gender = ‘F’ and vote = ‘Y’

10/15/08 Sudarshan 35
Bit Map Index

Base Table Region Index Rating Index


Cust Region Rating Row ID N S E W Row ID H M L
C1 N H 1 1 0 0 0 1 1 0 0
C2 S M 2 0 1 0 0 2 0 1 0
C3 W L 3 0 0 0 1 3 0 0 0
C4 W H 4 0 0 0 1 4 0 0 0
C5 S L 5 0 1 0 0 5 0 1 0
C6 W L 6 0 0 0 1 6 0 0 0
C7 N H 7 1 0 0 0 7 1 0 0

Customers where Region = W And Rating = M

10/15/08 Sudarshan 36
BitMap Indexes
 Comparison, join and aggregation operations
are reduced to bit arithmetic with dramatic
improvement in processing time
 Significant reduction in space and I/O (30:1)
 Adapted for higher cardinality domains as well.
 Compression (e.g., run-length encoding)
exploited
 Products that support bitmaps: Model 204,
TargetIndex (Redbrick), IQ (Sybase), Oracle 7.3

10/15/08 Sudarshan 37
Join Indexes
 Pre-computed joins
 A join index between a fact table and a dimension
table correlates a dimension tuple with the fact
tuples that have the same value on the common
dimensional attribute
 e.g., a join index on city dimension of calls fact table
 correlates for each city the calls (in the calls table) from
that city

10/15/08 Sudarshan 38
Join Indexes
 Join indexes can also span
multiple dimension tables
 e.g., a join index on city and time
dimension of calls fact table

10/15/08 Sudarshan 39
Star Join Processing
 Use join indexes to join dimension
and fact table

Calls
C+T

Time
C+T+L

Loca-
tion C+T+L
Plan +P

10/15/08 Sudarshan 40
Bitmapped Join Processing

Bitmaps
1
Time Calls
0
1

Loca- 0

tion Calls 0
1 AND

Plan Calls
1
1
0

10/15/08 Sudarshan 41
OLAP Is FASMI
 Fast
 Analysis
 Shared
 Multidimensional
 Information

Nigel Pendse, Richard Creath - The OLAP Report

10/15/08 Sudarshan 42
Warehouse Products
 Computer Associates -- CA-Ingres
 Hewlett-Packard -- Allbase/SQL
 Informix -- Informix, Informix XPS
 Microsoft -- SQL Server
 Oracle -- Oracle7, Oracle Parallel Server
 Red Brick -- Red Brick Warehouse
 SAS Institute -- SAS
 Software AG -- ADABAS
 Sybase -- SQL Server, IQ, MPP

10/15/08 Sudarshan 43
Products
 Oracle 8
 Informix
 Online Dynamic Server
 XPS --Extended Parallel Server
 Universal Server for object relational
applications
 Sybase
 Adaptive Server 11.5
 Sybase MPP
 Sybase IQ

10/15/08 Sudarshan 44
Warehouse Server
Products
 Red Brick Warehouse
 Tandem Nonstop
 IBM
 DB2 MVS
 Universal Server
 DB2 400
 Teradata

10/15/08 Sudarshan 45

Vous aimerez peut-être aussi