Vous êtes sur la page 1sur 16

DATA

WAREHOUSING

WHAT IS DW?
DW : storage area for processed and integrated data across different sources (operational data & external data)

A data warehouse allows its users to extract required data for Business Analysis & Strategic Decision Making

OTHER
A data warehouse is a

Subject-oriented Integrated, Non-volatile, Time-variant collection of


data in support management's decisions. of
William H. Inmon
3

SUBJECT ORIENTED
Example for an insurance company : Applications Area
Commercial and Life Insurance Systems Auto and Fire Policy Processing Systems

Data Warehouse

Customer

Policy

Data
Accounting System Claims Processing System Losses

Data
Premium

Billing System

INTEGRATED
Data is stored once in a single integrated location (e.g. insurance company)

Auto Policy Processing System

Data Warehouse Database

Customer data stored in several databases

Fire Policy Processing System

FACTS, LIFE Commercial, Accounting Applications

Subject = Customer

TIME - VARIANT
Data is stored as a series of snapshots or views which record how it is collected across time.
Data Warehouse Data

Time

Data

Data is tagged with some element of time - creation date, as of date, etc. Data is available on-line for long periods of time for trend analysis and forecasting. For example, five or more years

Key

NON-VOLATILE
Existing data in the warehouse is not overwritten or updated.
External Sources Data Warehouse Database

Production Databases Production Applications Data Warehouse Environment

Update Insert Delete

Load Read-Only

DATA WAREHOUSE DESIGN


The DW development can be done through 3 different methodologies Bottom-up design Top down design & Hybrid design

ARCHITECTURE

EXTRACT-TRANSFORM-LOAD
ETL is a process in data warehousing responsible for pulling data out of the source systems and placing it into a data warehouse. ETL involves the following tasks:

Extracting the data: from source systems (SAP, ERP, other operational systems), data from different source systems is converted into one consolidated data warehouse format which is ready for transformation processing. Transforming the data: may involve the following tasks: Applying business rules (so-called derivations, e.g., calculating new measures and dimensions)

10

Cleaning (e.g., mapping NULL to 0 or "Male" to "M" and "Female" to "F" etc.), Filtering (e.g., selecting only certain columns to load), Splitting a column into multiple columns and vice versa, Joining together data from multiple sources (e.g., lookup, merge), Transposing rows and columns, Applying any kind of simple or complex data validation (e.g., if the first 3 columns in a row are empty then reject the row from processing) Loading the data into a data warehouse or data repository other reporting applications

11

DATA MARTS
The data mart is a subset of the data warehouse that is usually oriented to a specific business line or team. The information in data marts pertains to a single department.

Each department or business unit is considered the owner of its data mart including all the hardware, software and data.
This enables each department to use, manipulate and develop their data any way they see fit without altering information inside other data marts or the data warehouse.

12

Time frame for implementation is less than data warehouse and takes around 4-12 months It is relatively cheap than data warehouse
Information
Individually Structured

Departmentally Structured

Data

Organizationally Structured

Data Warehouse

13

NEED FOR DATA WAREHOUSING


Better business intelligence for end-users Reduction in time to locate, access, and analyze information Consolidation of disparate information sources Strategic advantage over competitors Faster time-to-market for products and services Replacement of older, less-responsive decision support systems Reduction in demand on IS to generate reports

14

ADVANTAGES & LIMITATIONS


ADVANTAGES Integrating of data from multiple sources Performing new types of analyses Reducing cost to access historical data Improved decision support system LIMITATIONS Long initial implementation time and associated high cost Adding new data sources takes time and associated high cost
15

16

Vous aimerez peut-être aussi