Vous êtes sur la page 1sur 14

DATA WAREHOUSE

Lecture Notes

Ali Abdullah Khan


Aliabdullahkhanb2@gmali.com
Contents
Lecture 1 and 2—Definition of Data Warehouse.......................................................................................... 2
Definition .................................................................................................................................................. 2
Who needs Data warehouse? ................................................................................................................... 3
Lecture 3 and 4 — Example .......................................................................................................................... 4
What is Data warehouse ........................................................................................................................... 4
Example and Operational and Strategic Data (Information) ................................................................ 4
Lecture 5 and 6 — History, Types, and Slides ............................................................................................... 5
History of Data Warehouse....................................................................................................................... 5
How Data warehouse works? ................................................................................................................... 5
Types of Data Warehouse ......................................................................................................................... 6
OLTP .......................................................................................................................................................... 6
Assignment 1............................................................................................................................................. 6
Some kind of explanation of Data Warehouse. ........................................................................................ 7
Problem: Heterogeneous Information.................................................................................................. 7
The Traditional Research Approach .......................................................................................................... 7
Disadvantages of Query-Driven Approach................................................................................................ 8
Lecture 7 and 8 — Data Mining .................................................................................................................... 9
The key features of Data mining are discussed below\ ............................................................................ 9
Benefits of data mining ............................................................................................................................. 9
Difference between Data Warehouse and Data Mining......................................................................... 10
Tools and Techniques ............................................................................................................................. 11
Lecture 9 and 10: Typical Application of Data Warehouse ......................................................................... 12
Typical Application of Data Warehouse.................................................................................................. 12
Data Mining in Telecommunication ........................................................................................................ 12
Data Mining for Telecommunication Industry ........................................................................................ 12
Multidimensional analysis of telecommunication data ...................................................................... 12
Three Units of Telecommunication .................................................................................................... 13
Data Warehouse
Lecture 1 and 2—Definition of Data Warehouse
Definition
A data warehousing is defined as a technique for collecting and managing data
from varied sources to provide meaningful business insights. It is a blend of
technologies and components which aids the strategic use of data.
It is a process of transforming data into information and making it available to
users in a timely manner to make a difference.
Data warehouse system is also known by the following name:
1. Decision Support System (DSS)
2. Executive Information System
3. Management Information System
4. Business Intelligence Solution
5. Analytic Application
6. Data Warehouse
Who needs Data warehouse?
Data warehouse is needed for all types of users like:
 Decision makers who rely on mass amount of data
 Users who use customized, complex processes to obtain information from
multiple data sources.
 It is also used by the people who want simple technology to access the data
 It also essential for those people who want a systematic approach for
making decisions.
 If the user wants fast performance on a huge amount of data which is a
necessity for reports, grids or charts, then Data warehouse proves useful.
 Data warehouse is a first step if you want to discover 'hidden patterns' of
data-flows and groupings.
Lecture 3 and 4 — Example
What is Data warehouse
“A DW is a
 subject-oriented,
 integrated,
 time-varying,
 non-volatile
collection of data that is used primarily in organizational decision making.”
-- W.H. Inmon, Building the Data Warehouse, 1992

Example and Operational and Strategic Data (Information)


Follow the video link

https://www.youtube.com/watch?v=KgjUsie50WQ
or
https://www.youtube.com/watch?v=hO13N2b-gXE
Lecture 5 and 6 — History, Types, and Slides
History of Data Warehouse
The Data Warehouse benefits users to understand and enhance their
organization's performance. The need to warehouse data evolved as computer
systems became more complex and needed to handle increasing amounts of
Information. However, Data Warehousing is a not a new thing.

Here are some key events in evolution of Data Warehouse-

 1960- Dartmouth and General Mills in a joint research project, develop the
terms dimensions and facts.
 1970- A Nielsen and IRI introduces dimensional data marts for retail sales.
 1983- Tera Data Corporation introduces a database management system
which is specifically designed for decision support
 Data warehousing started in the late 1980s when IBM worker Paul Murphy
and Barry Devlin developed the Business Data Warehouse.
 However, the real concept was given by Inmon Bill. He was considered as a
father of data warehouse. He had written about a variety of topics for
building, usage, and maintenance of the warehouse & the Corporate
Information Factory.

How Data warehouse works?


A Data Warehouse works as a central repository where information arrives from
one or more data sources. Data flows into a data warehouse from the
transactional system and other relational databases.
Data may be:
1. Structured
2. Semi-structured
3. Unstructured data
 The data is processed, transformed, and ingested so that users can access
the processed data in the Data Warehouse through Business Intelligence
tools, SQL clients, and spreadsheets
 A data warehouse merges information coming from different sources into
one comprehensive database.

Types of Data Warehouse


Three main types of Data Warehouses are:
1. Enterprise Data Warehouse:
Enterprise Data Warehouse is a centralized warehouse. It provides
decision support service across the enterprise. It offers a unified
approach for organizing and representing data. It also provide the ability
to classify data according to the subject and give access according to
those divisions.
2. Operational Data Store:
Operational Data Store, which is also called ODS, are nothing but
data store required when neither Data warehouse nor OLTP systems
support organizations reporting needs. In ODS, Data warehouse is
refreshed in real time. Hence, it is widely preferred for routine
activities like storing records of the Employees.
3. Data Mart:
A data mart is a subset of the data warehouse. It specially designed
for a particular line of business, such as sales, finance, sales or
finance. In an independent data mart, data can collect directly from
sources.

OLTP
Online Analytical Processing, a category of software tools which provide analysis
of data for business decisions. OLAP systems allow users to analyze database
information from multiple database systems at one time.

Assignment 1
What is the difference between data ware house and data mining?
Some kind of explanation of Data Warehouse.
Problem: Heterogeneous Information
 Different interface
 Different data representation
 Duplication and inconsistent information
Solution

This is from slides, slide 8 to 16 – Starts from here--

The Traditional Research Approach


Query-driven (lazy, on-demand)
Disadvantages of Query-Driven Approach
 Delay in query processing
o Slow or unavailable information sources
o Complex filtering and integration
 Inefficient and potentially expensive for frequent queries
 Competes with local processing at sources
 Hasn’t caught on in industry

– END--
Lecture 7 and 8 — Data Mining
The key features of Data mining are discussed below\
4. Automatic discovery of patterns
5. Prediction of likely outcomes
6. Creation of actionable information
7. Focus on large data sets and databases

Benefits of data mining


1. Direct marketing
The ability to predict who is most likely to be interested in what
products
2. Trend analysis
Understanding trends in the marketplace is a strategic advantage
because it helps reduce costs and timeliness to market.
3. Fraud detection
Data mining techniques can help discover which insurance claims,
cellular phone calls or credit card purchases are likely to be
fraudulent.
4. Forecasting in financial markets
Data mining techniques are extensively used to help model financial
markets.
Difference between Data Warehouse and Data Mining.

Data Warehousing Data Mining

It is a process which is used to integrate data It is the process which is used to extract

from multiple sources and then combine it useful patterns and relationships from a

into a single database. huge amount of data.

It provides the organization a mechanism to Data mining techniques are applied on

store huge amount of data. data warehouse in order to discover useful

patterns.

This process must take place before data This process always takes place after data

mining process because it compiles and warehousing process because it requires

organizes data into a common database. compiled data to extract useful patterns.

This process is solely carried out by This process is carried out by business

engineers. users with the help of engineers.


Tools and Techniques

1. Amazon Redshift is an excellent data warehouse product which is a very


critical part of Amazon Web Services – a very famous cloud computing
platform.
2. Redshift is a fast, well-managed data warehouse that analyses data using
the existing standard SQL and BI tools. It is a simple and cost-effective tool
that allows running complex analytical queries using smart features of
query optimization. It handles analytics workload pertaining to big data sets
by utilizing columnar storage on high-performance disks and massively
parallel processing concepts.
3. Teradata is another market leader when it comes to database services and
products. It is an internationally renowned company with its headquarters
in Ohio. Most of the competitive enterprise organizations use Teradata
DWH for insights, analytics & decision making.
Lecture 9 and 10: Typical Application of Data Warehouse
Typical Application of Data Warehouse
A telecommunication business solution.
Reporting tools: transfer the users to detect fraud and identify communication
pattern.

Data Mining in Telecommunication


 To detect frauds
 To gain knowledge about the customers
 Retain Customers
 What products and services yield highest amount of profit?
 What are the factors that influence customers to call more at certain
times?

Data Mining for Telecommunication Industry


A rapidly expanding and highly competitive industry and a great demand for data
mining
Understand the business involved
Identify telecommunication patterns
Catch fraudulent activities
Make better use of resources
Improve the quality of service Multidimensional analysis of telecommunication
data

Multidimensional analysis of telecommunication data


 Calling-time
 Duration
 Location of caller
 Location of caller
 Type of call, etc.
Three Units of Telecommunication
1. A transmitter that takes information and converts it to a signal.
2. A transmission medium, also called the physical channel that carries the
signal.
3. A receiver that takes the signal from the channel and converts it back into
usable information for the recipient.

– DW-OLAP slides: Slide from 14 to 39--

Vous aimerez peut-être aussi