Vous êtes sur la page 1sur 8

Data Warehousing

In this module, we will study our first topic in business intelligence, Data
Warehousing (DW), with the following activities.
1) Define Data Warehousing
2) Familiarize with a generic DW framework;
3) Take a closer look at DW framework with specifics of a case study;
4) Gain exposure to Real-time Data Warehousing.
In this module, we will cover our first topic in BI, data warehousing, with the following activities. We will start out with a definition of data warehouse and data warehousing; we will then get
familiarized with a generic DW framework described in the textbook; I will then introduce a case study based on my experience in the credit card industry on how a DW helped to launch a targeted
marketing campaign; We will conclude this module with some exposure to real-time data warehousing through the completion of HW2.

Introduction to Business Intelligence

1/8

Dr. Nuo Xu

1. Data Warehousing
Let us now study a definition of data warehouse that has been widely used in both academia and industry.

A definition of Data Warehouse


A data warehouse is a subject-oriented, integrated, time-variant, and
nonvolatile collection of data in support of managements decision making
process W.H. Inmon, building the Data Warehouse. John Wiley & Sons, 1996
A data warehouse is defined by Inmon as a collection of data in support of managements decision making that has the following four characteristics. 1) It is subject oriented, meaning data are
organized by different subjects, such as sales, products, or customers; 2) It is integrated, meaning all relevant data from different data sources are integrated in ways that are conducive for
corresponding analysis; 3) It is Time-variant, meaning it typically includes historical data in the format of time series; 4) It is nonvolatile, meaning the users, knowledge workers in this case such as data
analysts and manager, only have read access.

Data Warehousing
The process of building and maintaining a data warehouse is known as Data
Warehousing.

Introduction to Business Intelligence

2/8

Dr. Nuo Xu

2. A Generic Data Warehousing Framework


The following diagram is a schematic representation of a generic data warehousing process provided by the textbook. (The reason I describe this framework as generic is because in practice the
actual implementation of data warehousing varies greatly among businesses of different sizes and types.)

Introduction to Business Intelligence

3/8

Dr. Nuo Xu

On the far left side of the diagram are the data sources for data warehousing, which can be either internal or external to a company. The first four database icons represent some typical internal data
sources within an organization: ERP stands for enterprise resource planning system and POS stands for point of sale; and these are two examples of OLTP systems that are supporting and tracking
day-to-day activities of a business, Depending on type of business, an ERP system in an auto manufacturer would have an inventory subsystem supporting ordering and tracking thousands of parts
from hundreds of different venders. And a POS system for a retailer will be responsible for recording each transaction and charging customers for products or service they receive. For a social
network company like Facebook, most activities of the business occur on the web, which becomes a major data source of data warehousing. Some examples of external data are credit reports
compiled by credit bureaus and census data provided by the government.

Once data sources have been identified, the next activity in data warehousing is called ETL, which stands for extraction, transformation and load. Extraction refers to reading out relevant data from
one or more operation databases or external datasets; transformation refers to restructuring or merging those extracted data according to the need of analysis; and Load refers to populating data
warehouse with transformed data.
The resulting collection of data, shown in the center of the diagram, is the Data Warehouse. If the data warehousing process is applied to all corporate-wide data sources, it is referred to as enterprise
data warehouse; if only a subset of corporate-wide data that is of particular interest to a group of users is warehoused, it is referred to data mart. So a company might have a data mart for
engineering, one for marketing, so on and so forth. Metadata refers to the special type data about what data are available in a data warehouse and where they come from.

The rightmost portion of the diagram describes some of the many applications and usage of data warehouse, such as routine business reporting, data mining and text mining, dashboard which refers
to summarizing and representing data in the easiest digestible way for knowledge workers, many other customized applications.

Introduction to Business Intelligence

4/8

Dr. Nuo Xu

3. A Case Study: Targeted Marketing


We will now turn to a hypothetical case study based on my experience in the credit card industry on how Data Warehousing enables a targeted marketing campaign. This case study is intended to
provide specifics to substantiate our understanding on the DW framework we just introduced.

Business Background
ABC, a credit card company, attempts to expand to other consumer
lending business by developing a low-interest personal loan product.
Using BI to launch a targeted marketing campaign
o Identify most likely responders to a new personal loan product in
existing card members.
Highest Outstanding Balance.
Develop a responding score to measure likelihood of responding
by taking into account of multiple financial factors simultaneously?
The background of this case study is a credit card company attempts to expand to other consumer lending business by develop a low-interest personal loan product. And the company is trying to first
sell the product to its existing card member base. One approach of marketing a new product can be randomly sending out mails to card members hoping they will be interested in such a product. A
better approach is thorough the so-called targeted marketing, i.e, only market to those who are mostly likely to find this product meeting their needs. There are many ways of identifying likely
responder to a loan product. A simple solution is to assume people with higher outstanding balance are more in need of extra credit, which is sufficient in our case study to illustrate a DW framework.
But there are other solutions with more sophistication.

Business Objective
Solicit the card members with highest total average monthly balance
(including ABC card and all non-ABC trades) in last 12 months through mail
campaign to achieve the highest response rate for the mailing budget.
All non-ABC trade balances include other credit card a current ABC card member might have and other type of loans such as car loans and mortgages.

Introduction to Business Intelligence

5/8

Dr. Nuo Xu

Data Warehouse Framework

Enterprise
Data Depository

Teradata Database

Credit Bureaus
(CB)

(monthly total balance


on all other cards)

CB table(time series)

ABC Card Payment Billing

(monthly ABC card


balance)

ABC table

A mail list of card member


for marketing campaign

Unix Environment
-SAS enterprise
-data aggregation
-analytical modeling

Windows Environment
-Data Analysts

This diagram illustrates the DW framework in support of this targeted marketing campaign.

Introduction to Business Intelligence

6/8

Dr. Nuo Xu

Data Sources
Data sources include an internal operational database and an external data source. Card payment billing system supports and captures all transactional activities such as purchases and payments for
each card members.

Card Payment Billing System


Credit bureaus are data sharing hubs for individual financial records, such as outstanding loans and payment history. Data sent from all lenders are aggregated and sent back to lenders by subscription
so that one lender can form a holistic view on its customers with the knowledge of their financial behaviors with all other lenders.

Credit reports from Credit Bureaus


ETL (Extract, Transform and Load)
Many ETL processes are required. We will only look at two examples here marked by red arrows. The first ETL involves.

Extract semi-structured credit reports to compute aggregated Non-ABC


trade total balance
Credit reports are in semi-structured textual formats. For example, texts are divided into sections, and within some sections multiple fields are marked with blank spaces in-between; within other
sections, texts could exist in paragraphs. In order to obtain aggregated Non-ABC trade total balance, an ETL process needs to select the right section of the credit report, extract the balance
information at specified locations for every non-ABC trade and compute the total monthly balance. The results will be loaded to an enterprise data depository as flat files.

Stacking monthly data in last 12 month to form time series history


The next ETL process requires another procedure to pick up total monthly balance dataset from enterprise data depository for the last 12 months and stack them together to form the 12 month time
series and populate them as tables in a data base system called Teradata.

Cardmember ID
A
A

B
B

Introduction to Business Intelligence

month balance
201201
1500
201112
1400

201201
1500
201112
1400

7/8

Dr. Nuo Xu

Data Warehouse
In this particular example, Data warehouse includes both the Enterprise data depository for flat files and the Teradata Database Systems.

Enterprise data depository


Teradata (OLAP) systems
o Data Cube
o An example of Slice and Roll up to Pull average monthly balance
among all card members in 201112
Select AVERAGE(balance) from the table where month=201112;
Data cubes are tables with many columns (each called a dimension) and many rows (each storing fact at the most granular level. Data cubes are organized in a way to facilitate different queries. There
are many types of queries, such as slice, dice, drill down/up, and roll-up. By specifying month=201112, we retrieve one slice of the 3-D data cube, and we then use an arithmetic function AVERAGE to
compute the average on the balance dimension for all card members.

Applications
User: Data Analysts
Architecture: Windows (Tier 1) Unix (Tier 2) TeraData (Tier 3)
Activities: Produce a list of current card members who have the highest
total monthly outstanding balance to marketing team for mail campaign
In terms of application in this DW framework, Data Analysts will use a PC to access a Unix server, where the statistical analysis software SAS is residing. The SAS software will then access Teradata
database system to enable Data analysts to perform corresponding data analysis. From the DW architecture perspective, Windows will be referred to as Tier 1, Unix Tier 2 and Teradata Tier 3. In this
example, the end product from data analysis will be a list of current card members who have the highest total monthly outstanding balance in the last 12 months, based on which the marketing team
to launch a targeted mailing campaign for the newly developed loan product.

4. Real-time Data Warehousing


Please go ahead to complete HW2 on real-time data warehousing.

Introduction to Business Intelligence

8/8

Dr. Nuo Xu

Vous aimerez peut-être aussi