Académique Documents
Professionnel Documents
Culture Documents
Dr DVLN Somayajulu
Professor
Department of Computer Science and Engineering
National Institute of Technology
Warangal
1
Outline
DataWarehouse Evolution
What is Data Warehouse
Uses of Data Warehouse
Advantages of Data Warehouse
Types of Warehouses
Roadmap to Data Warehousing
Conclusion
2
What is Data Warehousing?
A process of transforming
Information
data into information and
making it available to
users in a timely enough
manner to make a
difference
Data
3
What are Data Warehouses?
Data warehouses store large volumes of data
which are frequently used by DSS
It is maintained separately from the
organization’s operational databases
Data warehouses are relatively static with
only infrequent updates
A data warehouse is a stand-alone repository
of information, integrated from several,
possibly heterogeneous operational
databases
4
Data Warehousing
Is the enabling technology that
facilitates improved business decision-
making
It’s a process, not a product
A technique for assembling and
managing a wide variety of data from
multiple operational systems for
decision support and analytical
processing
It’s a journey
destination ... not a
5
Data Collection and Database Creation ( 1960s and earlier)
- Primitive File processing
8
Data Collection and Database Creation ( 1960s and earlier)
- Primitive File processing
10
OLTP Vs Warehousing
Organized by transactions Vs organized
by particular subject
More number of users vs less number of
users
Accesses few records vs entire table
Small databases vs large databases
Normalized data structure vs un-
normalized
Continuous update vs periodic update
11
Normal Reporting Architecture
Source
Reports
Reports
Reports
12
Examples of OLTP Systems
General ledger
Accounts payable
Financial management
Order processing
Order entry
Inventory
13
Problems with current reporting
structures
Accessibility
Timeliness
Format
Integration
14
Data Collection and Database Creation ( 1960s and earlier)
- Primitive File processing
– Information provider
– Not an off-the-shelf
product Users
Subject Integrated
Oriented
Data
Warehouse
© Prentice Hall
Subject-oriented
Organized around major subjects
such as customer, supplier, product,
time and sales
Focuses on the modeling and
analysis of data for decision makers.
Provides simple and concise view on
a particular subject issue by
excluding unwanted data for support
of making decisions.
19
Covers subjects of interest rather
than application areas
OLTP::
Retail Sales Outlet Sales Catalog Sales
System System System
Equity Customer
Customer
Plans Financial
Financial
Information
Information
gS
in
s
v
a
Shares
Loans Data Warehouse
Subject Area
Insurance
Operational Systems
© Prentice Hall
Subject Areas
– Business area organization
– Typical subject areas
» Customer accounts Customer
Customer
Financial
Financial
» Product sales Information
Information
» Customer savings
Data Warehouse
» Toll call usage Subject Area
» Passenger booking
» Insurance claims
– Model contains measures and analysis
criteria
© Prentice Hall
Subject Oriented
Entry
Sales Rep Sales
Sales
Quantity Sold
Prod Number
Date Customers
Customers
Customer Name
Product Description
Unit Price Products
Products
Mail Address
A data warehouse brings together data from various Sources and makes
24
it available to users eager to create their own reports
Integrated
Data on a given subject is defined and stored once.
Savings No
Application Application
Flavor
Current
Accounts
Application
© Prentice Hall
Integration of Data
Appl. A - M, F
Encoding Appl. B - 1, 0 M, F
Appl. C - X, Y
Integration
Appl. C - pipeline mcf
Appl. A - bal-on-hand
Naming Appl. B - current_balance balance
Conventions Appl. C - balance
29
Provides common coding of data
Both within and across subject areas
OLTP:
Retail Sales Outlet Sales Catalog Sales
System System System
Time Data
01/97 Data for January
Data
Warehouse
© Prentice Hall
Time Variant
– Historical data Time Data
20
15
S a le s ( in la k h s
10 E a st
)
W e st
5 N o r th
0
Ja nuary F ebruary M arch
Y ear97
Warehouse:
2000 2001 2002
1 2 3 4 5 1 2 3 4 5 1 2 3 4 5
6 7 8 9 1 1 1 6 7 8 9 1 1 1 6 7 8 9 1 1 1
0 1 2 0 1 2 0 1 2
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
3 4 5 6 7 8 9 3 4 5 6 7 8 9 3 4 5 6 7 8 9
2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
0 1 2 3 4 5 6 0 1 2 3 4 5 6 0 1 2 3 4 5 6
2 2 2 3 3 2 2 2 3 3 2 2 2 3 3
7 8 9 0 1 7 8 9 0 1 7 8 9 0 1
Load
INSERT Read
Read
UPDATE
DELETE
© Prentice Hall
Volatility of Data
Volatile Non-Volatile
Insert Change
Delete Access
Insert Load
Change
Access
Warehouse:
U Read
S
E
R
OLTP read/write vs.
Data warehouse read-only
39
Changing Data
First time load
Refresh
Refresh
Purge
or
Archive
Refresh
© Prentice Hall
Characteristics of data in DW
DW can be viewed as an informational
system with the following attributes –
It is a database designed for analytical
tasks using data from multiple applications.
Supports small no. of users with long
interactions.
Usage is read-intensive.
Content is periodically updated (mostly
additions)
Contains a few large tables
41
Benefits of DW
Access to a wide variety of data
Results can be presented in a variety of formats
(reports, graphs)
Enhances the value of operational business
applns.
Cost of product introduction comes down with
target marketing campaigns.
Better decisions at low cost.
Clear picture on asset and liability mgmt,
enterprise wide purchasing and inventory
patterns.
Maintain good relations with customers by
knowing their requirements. 42
Limitations of DW
Can not create additional data.
If data quality is poor then decision will
be inaccurate.
43
Risks of DW
Organizational: risks relate to project team
Technological: selection of technology, poor
scalability of architecture.
Project Mgmt: scale and scope of projects
are ill-defined.
Data and Design: poor quality of data,
unreliable data, improper collection of data.
44
Why Separate Data
Warehouse?
Performance
– Op dbs designed & tuned for known txs &
workloads.
– Complex OLAP queries would degrade perf. for op
txs.
– Special data organization, access &
implementation methods needed for
multidimensional views & queries.
45
Why Separate Data
Warehouse?
Function
– Missing data: Decision support requires historical data, which
op dbs do not typically maintain.
– Data consolidation: Decision support requires consolidation
(aggregation, summarization) of data from many
heterogeneous sources: op dbs, external sources.
– Data quality: Different sources typically use inconsistent data
representations, codes, and formats which have to be
reconciled.
46
Data Warehouse Evolution
STAGE 1 STAGE 2 STAGE 3 STAGE 4 STAGE 5
REPORTING ANALYZING PREDICTING OPERATIONALIZING ACTIVE
WAREHOUSING
WHAT happened? WHY did it happen? WHY will it happen?WHAT IS Happening? MAKING it happen!
48
Data Warehouse Pitfalls
Some transaction processing systems feeding the
warehousing system will not contain detail
Many warehouse end users will be trained and never or
seldom apply their training
After end users receive query and report tools, requests
for IS written reports may increase
Your warehouse users will develop conflicting business
rules
Large scale data warehousing can become an exercise
in data homogenizing
49
Data Warehouse Pitfalls
'Overhead' can eat up great amounts of disk
space
The time it takes to load the warehouse will
expand to the amount of the time in the
available window... and then some
Assigning security cannot be done with a
transaction processing system mindset
You are building a HIGH maintenance system
You will fail if you concentrate on resource
optimization to the neglect of project, data, and
customer management issues and an
understanding of what adds value to the
customer 50
Warehouse Architecture
Normally a relational database
Business intelligence tools,
designed to hold large amounts
Executiveofinformation systems,
Operations requiring
information for dataloading,
analysis
OLAP, processing, and applications
and data mining
manipulating data fromAny electronic
the DW. repository
Covers user of
management, security, information
and capacitythat containsas
restrictions data of
well.
Extracts
interest for data from
management usethe
orsource
analyticslocations and transforms it to the
target format and structure
Characterizations
– Aggregate queries on certain specified attribute
( called Dimension)
– Data Organized around major subjects such as
supplier, stores etc.
52
Business Intelligence vs
Transaction Processing
Operational systems are designed to work
with small pieces of information
Operational data must frequently be updated
in real time
Operational system schemas are designed
for rapid data input.
Operational users need immediate response
Operational system usage patterns are
relatively predictable
Design of operational system is complex. 53
Uses of Data Warehouses
Data mining
54
Advantages of Warehousing
Lowers cost of information access
Identifies
hidden business
opportunities
55
Types of Data Warehouses
Operational data store: Operational
data mirror.
– Example: item in stock
Enterprise data warehouse:
Historical analysis, Complex
pattern analysis
Data marts
56
Return on Investment
Data warehouse enhances better
market for customers. It depends
on:
– More rapid access to data
– More reliable reporting
– More flexible data presentation
57
Return on Investment ( contd)
Warehousing go/no go decision depends on:
– Does it give competitive advantage?
– Does it improve the bottom line?
– Will it deliver on all its promises?
– Will it be delivered on time?
– What is the risk, if you don’t do it?
– What is the risk, if you do it?
– Will it be delivered on budget?
58
Source Systems Executives, Managers,
and Business Analysts
Client Data
Custom
Application
Packaged
Application
Custom 59
Application
Source Systems Executives, Managers,
and Business Analysts
Client Data
OLAP
Cube
Custom Data
Application Warehouse
OLAP
Cube
ERP
61
Conclusion
File systems are primitive reporting
structures, generally targeted towards a
person. They are rigid in format and
generate periodic reports.
64