Vous êtes sur la page 1sur 37

Data Warehouse

DATA WAREHOUSE
The

data warehouse is an informational


environment that
Provides an integrated and total view of the
enterprise
Makes the enterprises current and historical
information easily available for decision making
Makes decision-support transactions possible
without hindering operational systems
Renders the organizations information
consistent
Presents a flexible and interactive source of
strategic information

Concept
An

Environment, Not a Product

A data warehouse is not a single software


or hardware product you purchase to
provide strategic information.
An ideal environment for data analysis and
decision support
Fluid, flexible, and interactive
100 percent user-driven
Very responsive and conducive to the ask
answeraskagain pattern
Provides the ability to discover answers to
complex, unpredictable questions

Concept
A

Blend of Many Technologies

Take all the data from the operational


systems
Where necessary, include relevant data from
outside, such as industry benchmark
indicators
Integrate all the data from the various
sources
Remove inconsistencies and transform the
data
Store the data in formats suitable for easy
access for decision making

FEATURES
Subject-Oriented
Integrated

Data

Data
Time-Variant Data
Nonvolatile Data
Data Granularity

basic and fundamental questions


Top-down

or bottom-up
approach?
Enterprise-wide or departmental?
Which firstdata warehouse or
data mart?
Build pilot or go with a fullfledged implementation?
Dependent or independent data
marts?

Data warehouse & Data


mart
DATA

WAREHOUSE

Corporate/Enterprisewide
Union of all data
marts
Data received from
staging area
Queries on
presentation resource
Structure for
corporate view of data
Organized on E-R
model

DATA

MART

Departmental
A single business
process
Star-join (facts &
dimensions)
Technology optimal
for data access and
analysis
Structure to suit the
departmental view
of data

Top-Down Versus Bottom-Up


Approach
Top-Down

Approach

The advantages
A truly corporate effort, an enterprise view of data
Inherently architectednot a union of disparate
data marts
Single, central storage of data about the content
Centralized rules and control
May see quick results if implemented with iterations

The disadvantages

Takes longer to build even with an iterative method


High exposure/risk to failure
Needs high level of cross-functional skills
High outlay without proof of concept

Top-Down Versus Bottom-Up


Approach
Bottom-Up

Approach

The advantages
Faster and easier implementation of manageable
pieces
Favorable return on investment and proof of concept
Less risk of failure
Inherently incremental; can schedule important data
marts first
Allows project team to learn and grow

The disadvantages

Each data mart has its own narrow view of data


Permeates redundant data in every data mart
Perpetuates inconsistent and irreconcilable data
Proliferates unmanageable interfaces

A Practical Approach
The

steps are as follows:

Plan and define requirements at the


overall corporate level
Create a surrounding architecture for
a complete warehouse
Conform and standardize the data
content
Implement the data warehouse as a
series of supermarts, one at a time

THE COMPONENTS

Trends
PRODUCTS BY FUNCTIONS
Data

Integrity & Cleansing

(12)
Data Modeling (10)
Extraction/Transformation
Generic (26)
Application-specific (9)
Data

Movement (12)
Information Servers
Relational DBs (9)
Specialized Indexed DBs (5)
Multidimensional DBs (16)
Decision

Support

Relational OLAP (9)


Desktop OLAP (9)
Query & Reporting (19)
Data Mining (23)
Application Development (9)

Administration

&

Management
Metadata Management
(14)
Monitoring (5)
Job Scheduling (2)
Query Governing (3)
Systems Management (1)
DW

Enabled Applications

Finance (10)
Sales/Marketing/CRM (23)
Balanced Scorecard (5)
Industry specific (21)

Turnkey

Systems (14)

PLANNING YOUR DATA


WAREHOUSE
Key

Issues

Value and Expectations.


Risk Assessment.
Top-down or Bottom-up.
Build or Buy.
Single Vendor or Best-of-Breed.

Business Requirements,
Not Technology
Let

business requirements drive


your data warehouse, not
technology.
Data warehousing is not about
technology, it is about solving
users need for strategic
information.
Do not plan to build the data
warehouse before understanding
the requirements

What types of information must you


gather in the preliminary survey?
Mission

and functions of each user group


Computer systems used by the group
Key performance indicators
Factors affecting success of the user group
Who the customers are and how they are classified
Types of data tracked for the customers, individually and
groups
Products manufactured or sold
Categorization of products and services
Locations where business is conducted
Levels at which profits are measuredper customer, per
product, per district
Levels of cost details and revenue
Current queries and reports for strategic information

Justifying Your Data


Warehouse
Here

are some sample approaches for preparing the justification:

Calculate the current technology costs to produce the applications and


reports supporting strategic decision making. Compare this with the
estimated costs for the data warehouse and find the ratio between the
current costs and proposed costs. See if this ratio is acceptable to senior
management.
Calculate the business value of the proposed data warehouse with the
estimated dollar values for profits, dividends, earnings growth, revenue
growth, and market share growth. Review this business value expressed
in dollars against the data warehouse costs and come up with the
justification.
Do the full-fledged exercise. Identify all the components that will be
affected by the proposed data warehouse and those that will affect the
data warehouse. Start with the cost items, one by one, including
hardware purchase or lease, vendor software, in-house software,
installation and conversion, ongoing support, and maintenance costs.
Then put a dollar value on each of the tangible and intangible benefits
including cost reduction, revenue enhancement, and effectiveness in the
business community. Go further to do a cash flow analysis and calculate
the ROI.

DATA WAREHOUSING
INITIATIVE:
Outline for Overall Plan
INTRODUCTION
MISSION

STATEMENT

SCOPE
GOALS

& OBJECTIVES
KEY ISSUES & OPTIONS
VALUES & EXPECTATIONS
JUSTIFICATION
EXECUTIVE SPONSORSHIP
IMPLEMENTATION STRATEGY
TENTATIVE SCHEDULE
PROJECT AUTHORIZATION

How is it Different?
Suggestion
Consciously

recognize that a data warehouse project has broader scope, tends


to be more complex, and involves many different technologies.
Allow for extra time and effort for newer types of activities.
Do not hesitate to find and use specialists wherever in-house talent is not
available. A data warehouse project has many out-of-the-ordinary tasks.
Metadata in a data warehouse is so significant that it needs special treatment
throughout the project. Pay extra attention to building the metadata
framework properly.
Typically, you will be using a few third-party tools during the development and
for ongoing functioning of the data warehouse. In your project schedule, plan
to include time for the evaluation and selection of tools.
Allow ample time to build and complete the infrastructure.
Include enough time for the architecture design.
Involve the users in every stage of the project. Data warehousing could be
completely new to both IT and the users in your company. A joint effort is
imperative.
Allow sufficient time for training the users in the query and reporting tools.
Because of the large number of tasks in a data warehouse project, parallel
development tracks are absolutely necessary. Be prepared for the challenges
of running parallel tracks in the project life cycle.

The Development Phases


INTRODUCTION
PURPOSE
ASSESSMENT

OF READINESS
GOALS & OBJECTIVES
STAKEHOLDERS
ASSUMPTIONS
CRITICAL ISSUES
SUCCESS FACTORS
PROJECT TEAM
PROJECT SCHEDULE
DEPLOYMENT DETAILS

Data Extraction
Select

data sources and determine the types of filters to


be applied to individual sources
Generate automatic extract files from operational systems
using replication and other techniques
Create intermediary files to store selected data to be
merged later
Transport extracted files from multiple platforms
Provide automated job control services for creating
extract files
Reformat input from outside sources
Reformat input from departmental data files, databases,
and spreadsheets
Generate common application code for data extraction
Resolve inconsistencies for common data elements from
multiple sources

Data Transformation
Map

input data to data for data warehouse


repository
Clean data, deduplicate, and merge/purge
Denormalize extracted data structures as
required by the dimensional model of the data
warehouse
Convert data types
Calculate and derive attribute values
Check for referential integrity
Aggregate data as needed
Resolve missing values
Consolidate and integrate data

Data Staging
Provide

backup and recovery for staging area repositories


Sort and merge files
Create files as input to make changes to dimension tables
If data staging storage is a relational database, create and
populate database
Preserve audit trail to relate each data item in the data
warehouse to input source
Resolve and create primary and foreign keys for load
tables
Consolidate datasets and create flat files for loading
through DBMS utilities
If staging area storage is a relational database, extract
load files

Vous aimerez peut-être aussi