Experiment No.02: LAB Manual Part A

LAB Manual
PART A
(PART A : TO BE REFFERED BY STUDENTS)
Experiment No.02
Aim: Design a star schema, snowflake schema and fact constellation schema for any subject of
your choice.
Prerequisite:
Fundamental Knowledge of Database Management
Fundamental Knowledge of SQL
Learning Outcomes:
Learning of Star, Snowflake & Fact Constellation(Galaxy) schema
Theory:
Dimensional modeling:
It is the name of a logical design technique often used for data warehouses. Dimensional
modeling always uses the concepts of facts, measures, and dimensions. Facts are typically (but
not always) numeric values that can be aggregated, Dimensions are groups of hierarchies and
descriptors that define the facts. For example, sales amount is a fact; timestamp, product,
register#, store#, etc. are elements of dimensions. Dimensional models are built by business
process area, e.g. store sales, inventory, claims, etc.
Fact table:
The fact table is not a typical relational database table as it is denormalized on purpose to
enhance query response times. The fact table typically contains records that are ready to
explore, usually with ad hoc queries. Records in the fact table are often referred to as events,
due to the timevariant nature of a data warehouse environment.
The primary key for the fact table is a composite of all the columns except numeric values
/scores (like QUANTITY, TURNOVER, exact invoice date and time). Typical fact tables in a
global enterprise data warehouse are (usually there may be additional company or business
specific fact tables):
Sales fact table contains all details regarding sales
Orders fact table in some cases the table can be split into open orders and historical orders.
Budget fact table usually grouped by month and loaded once at the end of a year.
Forecast fact table usually grouped by month and loaded daily, weekly or monthly.
Inventory fact table report stocks, usually refreshed daily
Dimension table:
Nearly all of the information in a typical fact table is also present in one or more dimension
tables. The main purpose of maintaining Dimension Tables is to allow browsing the categories
quickly and easily.
The primary keys of each of the dimension tables are linked together to form the composite
primary key of the fact table. In a star schema design, there is only one denormalized table for
a given dimension.
Typical dimension tables in a data warehouse are:
Time dimension table
Customers dimension table
Products dimension table
Key account managers (KAM) dimension table
Sales office dimension table
Star schema architecture:
Star schema architecture is the simplest data warehouse design. The main feature of a star
schema is a table at the center, called the fact table and the dimension tables which allow
browsing of specific categories, summarizing, drilldowns and specifying criteria. Despite the
fact that the star schema is the simplest data warehouse architecture, it is most commonly used
in the data warehouse implementations across the world today (about 9095% cases).
Snowflake Schema architecture:
Snowflake schema architecture is a more complex variation of a star schema design. The main
difference is that dimensional tables in a snowflake schema are normalized, so they have a
typical relational database design.
Snowflake schemas are generally used when a dimensional table becomes very big and when a
star schema cant represent the complexity of a data structure. For example if a PRODUCT
dimension table contains millions of rows, the use of snowflake schemas should significantly
improve performance by moving out some data to other table (with BRANDS for instance). The
problem is that the more normalized the dimension table is, the more complicated SQL joins
must be issued to query them. This is because in order for a query to be answered, many tables
need to be joined and aggregates generated.
Fact constellation/Galaxy schema Architecture:
For each star schema or snowflake schema it is possible to construct a fact constellation
schema. This schema is more complex than star or snowflake architecture, which is because it
contains multiple fact tables. This allows dimension tables to be shared amongst many fact
tables.
In a fact constellation schema, different fact tables are explicitly assigned to the dimensions,
which are for given facts relevant. This may be useful in cases when some facts are associated
with a given dimension level and other facts with a deeper dimension level. Use of that model
should be reasonable when for example, there is a sales fact table (with details down to the
exact date and invoice header id) and a fact table with sales forecast which is calculated based
on month, client id and product id.
These dimensions allow us to answer questions such as
In what regions of the country are pleated pants most popular? (fact table joined with the
product and shipto dimensions)
What percentage of pants were bought with coupons and how has that varied from quarter to
quarter? (fact table joined with the promotion and time dimensions)
How many pants were sold on holidays versus nonholidays? (fact table joined with the time
dimension)
PART B
(PART B : TO BE COMPLETED BY STUDENTS)
(Students must submit the soft copy as per following segments within two hours of the practical
slot. The soft copy must be uploaded on the Blackboard or emailed to the concerned lab in charge
faculties at the end of the practical in case the there is no Black board access available)
Roll No. E059
Class : Btech Comp. E
Date of Experiment: 1/8/16
Grade :
Date of Grading:
Name: Shubham Gupta

Batch : E3
Date of Submission: 8/8/16
Time of Submission:
B.1 Schemas Designed by student:

(Paste your schemas completed during the 2 hours of practical in the lab here)
Star
Time
Year
Quarter
Month
Week
Day
Time_ID
Customers
Customer_ID
Name
Age
Ticket Issued
Contact
Details
Category
Airlines
Airlines_ID
Name
Age
Ticket Issued
Contact
Details
Category
Membership
Fact Table
Time_Id
Customer_ID
Airlines_ID
Seat_Number
Transaction_ID
Seat
Seat_Numbe
r
Type
Cost
Position
Pop_Destn
Ticket_Span
Transaction
Max_Book
Transaction_ID
Price_change
Method of
Payment
Profit
Pop_time
Details of card
Reserved
SnowFlakes
Time
Year
Quarter
Month
Week
Day
Time_ID
Customers
Customer_ID
Name
Age
Ticket Issued
Contact
Details
Category
Airlines
Airlines_ID
Name
Age
Ticket Issued
Contact
Details
Category
Membership
Seat_number
Fact Table
Time_Id
Customer_ID
Airlines_ID
Seat_Number
Transaction_ID
Pop_Destn
Ticket_Span
Transaction
Max_Book
Transaction_ID
Price_change
Method of
Payment
Profit
Pop_time
Details of card
Seat
Seat_Numbe
r
Type
Cost
Position
Reserved
Consolation
Time
Year
Quarter
Month
Week
Day
Time_ID
Customers
Customer_ID
Name
Age
Ticket Issued
Contact
Details
Airlines
Fact Table
Airlines_ID
Name
Age
Ticket Issued
Contact
Details
Category
Membership
Seat_number
Time_Id
Customer_ID
Airlines_ID
Seat_Number
Transaction_ID
Pop_Destn
Transaction
Ticket_Span
Seat
Transaction_ID
Method of
Payment
Details of card
Max_Book
Price_change
Profit
Seat_Numbe
r
Cost
Position
Reserved
Pop_time
Seat Type
Seat_type
Privileges
Secondary
Fact Table
Transaction_ID
Seat_type
Slice
The slice operation selects one particular dimension from a given cube and provides a new sub-cube.
Consider the following diagram that shows how slice works.
Here Slice is performed for the dimension "time" using the criterion time = "Q1".
It will form a new sub-cube by selecting one or more dimensions.
Dice
Dice selects two or more dimensions from a given cube and provides a new sub-cube. Consider the
following diagram that shows the dice operation.
The dice operation on the cube based on the following selection criteria involves three dimensions.
(location = "Toronto" or "Vancouver")
(time = "Q1" or "Q2")
(item =" Mobile" or "Modem")
Pivot
The pivot operation is also known as rotation. It rotates the data axes in view in order to provide an
alternative presentation of data. Consider the following diagram that shows the pivot operation.
In this the item and location axes in 2-D slice are rotated.
2. Compare Dimensional table with Relational Table

Dimensional table
Data is stored in multidimensional tables.
Cubes are unit of storage
Data is denormalized
Non-Volatile
MDX used to manipulate data
OLAP reports
Few tables and fact tables
Relational Table
Data is stored in RDBMS.
Tables are units of storage
Data is normalized
Volatile
SQL used to manipulate data
Normal reports
Chain of tables and fact tables
3. What are the advantages of snowflake schema?

Dimension tables are normalized.
Normalized tables are easier to maintain
In snowflake schema, a dimension table will have one or more parent tables.
Relation between Dimension tables.

Experiment No.02: LAB Manual Part A

Transféré par

Informations du document

Description originale:

Titre original

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

Experiment No.02: LAB Manual Part A

Transféré par

Droits d'auteur :

Formats disponibles

LAB Manual

Name: Shubham Gupta

B.1 Schemas Designed by student:

It will form a new sub-cube by selecting one or more dimensions.

(location = "Toronto" or "Vancouver")

(time = "Q1" or "Q2")

(item =" Mobile" or "Modem")

2. Compare Dimensional table with Relational Table

3. What are the advantages of snowflake schema?

Vous aimerez peut-être aussi