Vous êtes sur la page 1sur 34

Designing the data warehouse

/ data marts
Methodologies and Techniques
Data Warehouse Design
Data warehouse are constructed in a
heuristic Manner,
Where one phase is depends on previous
phase.
First one portion of data is populated.
It is then used and then next based on the
feedback from the end user.
Life cycle of the DW
First time load

Operational Databases Warehouse Database first very small

Refresh
Then Put some more data

Refresh

Purge or Archive

Refresh
Oracle Warehouse
Any Source Components
Any Data Any Access

Relational
Relational / tools
Operational Multidimensional
data
Oracle Medi`

Text, image Spatial OLAP


tools

Audio,
External Web video
data Applications/ Web
Benefits of an Incremental
Approach
Delivers a strategic data warehouse
solution through incremental development
efforts
Provides extensible, scalable architecture
Quickly provides business benefits and
ensures a much earlier return of investment
Allows a data warehouse to be built based
on a subject or application area at a time
Allows the construction of an integrated
data mart environment
Data Mart
A subset of a data warehouse that
supports the requirements of a particular
department or business function.

Characteristics include:
Do not normally contain detailed operational data
unlike data warehouses.
May contain certain levels of aggregation
Data Model
A data model is a graphical view of data created for analysis and
design purposes.
A data model is a collection of concepts that can be used to describe the
structure of a database.
A database model is a type of data model that determines the logical
structure of a database and fundamentally determines in which manner
data can be stored, organized, and manipulated. The most popular
example of a database model is the relational model, which uses a table-
based format.
Levels of Data modeling:
There are three level of Data Modeling
Conceptual Data Model(high level)( ERD) - a (based on
entities & relationships) It provides concepts that are close to
the way many users perceive data. It describes a problem.
Logical Data Model(mid level)(Data item set) - in this each
entity in the ERD is further defined by its own DIS. illustrates
the specific entities, attributes and relationships involved in a
business function. Serves as the basis for the creation of the
physical data model.
Physical Data Model -Physical data model looks like a series
of tables, sometimes called relational tables. It provides
concepts that describe the details of how data is stored in the
computer. These concepts are meant for computer specialist,
not for typical end users.
ERD Model

Rectangle represent Entity


Diamond represent Relationship
Oval represent Attributes of an Entity
Data warehouse modeling includes:

Fact Tables and Dimension Tables


Multidimensional Model/Star Schema
Support Roll Up, Drill Down, and Pivot
Analysis
Conceptual, Logical and Physical Data
Models
Normalization and Denormalization
Model Granularity : Level of Detail
Normalization
Normalization is a process of reducing
redundancies of data in a database.
Normalization is a technique that is used
when designing and redesigning a
database. Normalization is a process or
set of guidelines used to optimally design
a database to reduce redundant data. The
actual guidelines of normalization, called
normal forms
The First Normal Form
The objective of the first normal form is to divide the
base data into logical units called tables. When each
table has been designed, a primary key is assigned
to most or all tables.
The Second Normal Form
The objective of the second normal form is to take data
that is only partly dependent on the primary key and
enter that data into another table.
the second normal form is derived from the first normal
form by further breaking two tables down into more
specific units.
The Third Normal Form

The third normal form's objective is to remove


data in a table that is not dependent on the
primary key.
Another table was created to display the use of the third
normal form. EMPLOYEE_PAY_TBL is split into two tables,
one table containing the actual employee pay information
and the other containing the position descriptions, which
really do not need to reside in EMPLOYEE_PAY_TBL. The
POSITION_DESC column is totally independent of the
primary key, EMP_ID .
Denormalizing
so that query time can be less because table will be less in number in denormalisation

The output of the data model process is a series of tables, each


contain s keys and attributes.
There is nothing wrong with lots of tables but there is a problem from
performance perspective.
The program must jump around many tables much time consumed.
In Denormalisaqtion tables are merged.
Like creating array by month so time will less consumed like this.
Denormalization is the process of taking a normalized database and
modifying table structures to allow controlled redundancy for
increased database performance.

Denormalization
Attempting to improve performance is the only reason to
ever denormalize a database.
A denormalized database is not the same as a database
that has not been normalized.
Denormalization may involve recombining separate tables
or creating duplicate data within tables to reduce the
number of tables that need to be joined to retrieve the
requested data, which results in less I/O and CPU time.
The next generation techniques
1. Meta data:
2:- Managing Reference table
Meta Data
Meta data contains:-
Structure of data as known to the
programmer.
Transformation of data
Data model
History of extract
Managing reference table
a Reference table is a table into which an. For
example, in a relational database model of a
warehouse the entity 'Item' may have a field
called 'status' with a predefined set of values
such as 'sold', 'reserved', 'out of stock'. or
Reference Table called 'status' in order to
achieve database normalisationown table
We ignore these type of reference which can
be very harmful in future.
Triggering the Data warehouse
Records
Some trigger (command)are written for data
warehouse snapshot.
Like every week or end of day or every
month monthly records snapshots will be
taken automatically this is called time
generated event.
So like this snapshot automatically taken
and this is called triggering the data
warehouse record.
Profile records
Profile records means individual activity
record.
Like customer call every day so to enter
every time name and call we grouped the
all call in one records there only one time
the name is written.
So that become that individual profile
records.
Managing volume
Volume of data in data warehouse is very
large.
So to manage this large volume of data is
very difficult.
To solve this problem profile records is
created so that data can be manage in
easy manner.
Create multiple profile records
Multiple profile records cab be created from
the same detail.
Like a profile record of individual is
available, we can take district profile records
where same kind of profile records will be
there.
And meta data that we are using for profile
records is not meta data but it is meta
process.
Access of Datawarehouse
Access of data warehouse can be

Direct access

Indirect access
Direct access

A request has been made within the operational


environment for data that resides in the data
warehouse.
The request is transferred to the data warehouse
environment and the data located in data
warehouse is transferred to operational system.
To do these transfer technology must be
compatible of operational and data warehouse
environment.
Legacy system

Query
Data
warehouse

Result of the query


Indirect Data Access
One of the most effective uses of the data
warehouse is indirect access of data
warehouse by the operational environment.
Example:
An airline commission calculation system.
If airline give less commission to agent
agent can go for another airline and if give
more commission airline can be in loss.
Historical
Airline booking
reservation
clerk Current
booking
Travel Flight
agent status
calculation

Flight date average


booking for date
Star Schema
Snow Flakes Schema
Triggering the Data WareHouse
The basic business interaction that cause the
data warehouse to become populated with data
is one that can be called an EVENT/SNAPSHOT
interaction.
Event that triggers a snapshot might be
occurrence of some natable activity, such as
making of a sale, delivery of a shipment
Some time related like on the end of the day,
one the end of the month snapshot goes to
Data warehouse through Triggers.
Profile Record (multiple event)
Profile records represent multiple event.
A profile record is created from the many
detailed records.
A bank may take all the monthly activities
of a customer and create an aggregate
data warehouse record that represent all
of his banking activities fro the month.

Vous aimerez peut-être aussi