Vous êtes sur la page 1sur 32

Data Warehouse

Testing

Agenda
What is Operational Database/System ?
Introduction of Data warehouse
Data warehouse Architecture
Data Extraction, Data Transformation ,Data

Loading & Data Mining


Characteristics of data warehouse
What is OLTP
What is OLAP
Difference between OLTP & OLAP
What is DSS

Operational Database/System
An operational database, as the name implies, is the

database that is currently and progressive in use


capturing real time data and supplying data for real
time computations and other analyzing processes.
The operational database is the source of data for the

data warehouse. It contains detailed data used to run


the day-to-day operations of the business.
The data continually changes as updates are made,

and reflect the current value of the last transaction.


An operational database contains enterprise data
which are up to date and modifiable.

Operational Database/System

In an enterprise data management system, an

operational database could be said to be an opposite


counterpart of a decision support database which
contain non-modifiable data that are extracted for the
purpose of statistical analysis
For example, an operational database is the one which

used for taking order and fulfilling them in a store


whether it is a traditional store or an online store.
An operational database is used for keeping track of

payments and inventory. It takes information and


amounts from credit cards and accountants use the
operational database because it must balance up to
the last penny.

Introduction of Data warehouse


A data warehouse is basically a storage area where all an
organization's information or data is stored and managed
in a manner that will allow all users in the organization to
use that data in their decision-making process.
It is an Information Warehouse which
Collects data from various operational data sources.
Integrates data into a logical business model.
Stores data in understandable and easy accessible way.
Delivers information to Decision Makers across

organization through Various Reports & Analysis tools.


Ideally a Data Warehouse is once again a Database

High level Data warehouse


Architecture

Data warehouse Architecture

Data Flow diagram

Data Extraction:
Data from different source systems is converted into
one consolidated data warehouse format which is ready
for transformation processing.
Data Transformation:
In transforming the data, the following tasks may
involve.
Applying business rules (for example calculating new
measures and dimensions)
Cleaning (for example Mapping NULL to 0 or "Male"
to "M" and "Female" to "F" etc)
Filtering (for example selecting only certain columns to
load),
Splitting a column into multiple columns and vice versa

Joining together data from multiple sources (for

example lookup, merge)


Transposing rows and columns
Applying any kind of simple or complex data
validation (for example if the first three
columns in a row are empty then reject the row
from processing)

Data Loading:
Loading data into the data warehouse.
End users directly access data derived from
several source systems through the data
warehouse
OLAP (Online Analytical Processing) are being
used aggressively by organizations to discover
valuable business trends from data marts and
data warehouses.

Data Mining:
Data mining, the extraction of hidden predictive
information from large databases, is the process
of analyzing data from different perspectives
and summarizing it into useful information
Data mining tools predict future trends and
behaviors, allowing businesses to make
proactive, knowledge-driven decisions.
It allows users to analyze data from many
different dimensions or angles, categorize it,
and summarize the relationships identified.
Technically, data mining is the process of finding
correlations or patterns among dozens of fields
in large relational databases.

Characteristics of a Data Warehouse


A Data Warehouse is a subject oriented, integrated, time
variant and nonvolatile collection of data in support of
management's decisions.
Subject oriented Data:
In operational systems data is stored by individual

applications. Data sets have to provide data for the


specific applications to perform the specific functions
efficiently. Therefore data sets for each application
need to be organized around that specific application.
In Data warehouses data is not stored by operational

applications, but by Business subjects. Business


subjects differ from enterprise to enterprise and they
are critical for the enterprise.

Characteristics of a Data Warehouse

Subject Oriented

Operational

Data
Time
Warehouse

Customer
Products

Location

Sales

Subject Oriented
Data is stored subject
wise
Independent
Data Warehouse stores data
subject wise of
Application

Application oriented
Data is dependant on
the Application

Integrated Data:
All the relevant data from various applications

must pull together for proper decision making.


The data in the data warehouse comes from
several operational systems. Sources data are
in different databases, files and data segments.
Data inconsistencies are removed and process

of transformation, consolidation and integration


of the source data are followed before the data
is stored in a data warehouse.

Characteristics of a Data Warehouse

Integrated
Data
Warehouse

Operational
Sales

Departmental
Within a department

Finance

Procure
ment

Integrated
Data is integrated across
Enterprise
One version of truth

Data Warehouse stores One Version of truth

NonVolatile Data:
The data in the data warehouse is primarily for query and

analysis and not intended to run the day-to-day business.


The data in a data warehouse is not as volatile as the data in
an operational database is.
Time-variant Data:
All data in the data warehouse is identified with a particular

time period.
The time-variant nature of the data in a data warehouse
Allows for analysis of the past
Related information to the present
Enables forecasts for the future

Characteristics of a Data Warehouse

Non-volatile
insert

change

Data
Warehouse

Operational
delet
e

insert
load

Data Warehouse Is Relatively Static In Nature


replace

change

read only
access

Characteristics of a Data Warehouse

Time Variant

Operational

Current Value data


time horizon : 60-90
days

Data
Warehouse

Snapshot data
time horizon : 5-10 years
data warehouse stores historical
data

Data Warehouse Typically Spans Across Time

What is an OLTP System?


OLTP is a class of program that facilitates and

manages transaction-oriented applications,


typically for data entry and retrieval.
It is also referred to computer processing in
which the computer responds immediately to
user requests
It is designed for catering to processing of large
numbers of concurrent users
Applications which use OLTP includes
Electronic banking
E-commerce
Order processing

OLTP

OLAP
OLAP stands for On-Line Analytical Processing.
OLAP has been growing in popularity due to

the increase in data volumes and the


recognition of the business value of analytics.
Until the mid-nineties, performing OLAP
analysis was an extremely costly process
mainly restricted to larger organizations.
OLAP allows business users to slice and dice
data at will.
Normally data in an organization is distributed
in multiple data sources and are incompatible
with each other.

Part of the OLAP implementation process involves

extracting data from the various data repositories


and making them compatible.
OLAPs are designed to give an overview analysis
of what happened.
OLAP provides a historical view of data, although
useful when used by itself, OLAP analysis
becomes truly powerful when combined with
predictive analysis from Data Mining

What is an OLAP System?


It is an approach to quickly provide the answer to

analytical queries that are dimensional in nature


Databases configured for OLAP employ a
multidimensional data model, allowing for
complex, analytical and ad-hoc queries with a
rapid execution time.
Applications which use OLTP includes

Business reporting for sales


Budgeting and forecasting
Financial Report

OLAP TOOLS

Micro Strategy, Cognos, Business Objects and SSAS

Data Warehouse (OLAP)


Can I see the
top selling five
products region
wise in the last
2 years ?

Data from multiple


OLTP sources is
integrated across the
Enterprise

A
Subject oriented,
Integrated,
Non-volatile,
Time-variant
data store containing
detailed and aggregate
corporate data Data stored for longer

It is a read only
database and data is
always inserted but
not modified

duration of time

Operational Systems (OLTP)


Data is only for the
A
department and used
What products
in the specific
to be shipped to
Application
oriented,
application
customer as
per the order ?
departmental,
volatile,
current valued
data store containing only
detailed raw corporate data

Identical queries may


give different results
at different times.

Data stored only for


current period. Old
Data is either
archived or moved to
Data Warehouse

An Example

Current
Current

/Recent
/Recent
Informatio
Informatio
nn

Historical
Historical

Informatio
Informatio
nn

Is this medicine available


in stock ?

OLTP

Has the incidence of


Tuberculosis increased in
last 5 years in Southern
region ?

OLAP

OLTP vs. OLAP


OLTP

OLAP

Source of
data

OLTPs are the original source OLAP data comes from the
of the data.(Operational data) various OLTP Databases
(Consolidation )

Purpose of
Data

To control and run


fundamental business tasks

To help with planning, problem


solving, and decision support

Inserts and
Updates

Short and fast inserts and


updates initiated by end
users

Periodic long-running batch


jobs refresh the data

Queries

Relatively standardized and


simple queries and returning
relatively few records

Often complex queries


involving aggregations

Processing
Speed

Typically very fast

Depends on the amount of


data involved. Batch data
refreshes and complex queries
may take many hours. (will use
Indexes)

Database
Design

Highly normalized with many


tables

Typically de-normalized with


fewer tables; use of Star or
Snowflake

What is Decision Support System?


Enterprises are recognizing information as a

strategic part of their business.


Data is looked as an asset
To optimize business process and deliver
benefits to the bottom line.
To gain insight from their data for making
more tactical decisions like what, where,
when, How?

DSS

Q&A

Thank You

Vous aimerez peut-être aussi