Vous êtes sur la page 1sur 49

IBM Software Group

Designing your BI Architecture


Exploiting your Data Warehouse
David Cope EDW Architect Asia Pacific

2007 IBM Corporation

IBM Software Group

The Analytical Evolution


Easy Mining and Alphablox enable insights to be delivered throughout the enterprise.
Dif IBM fe r e ntia to

Action Action
r

Insight
Business Value

Ad Hoc Analysis

Discovering previously unknown and unsuspected information. Empowering analysts to test hypotheses for better decision making. Query and OLAP

Reports

Static, repetitive queries about past results.


Decision Empowerment
2

IBM Software Group

IBM DB2 Warehouse Software


Embedded analytics Modeling and design
Datamining mining and Data and visualization visualization In-line In-line analytics analytics

Performance optimization
Data partitioning Workload control Deep compression

Data movement and transformation Database management IBM DB2 Warehouse

Administration and control

IBM Software Group

IBM DB2 Warehouse Software


Embedded analytics Modeling and design
Data mining and visualization In-line analytics

Performance optimization
Data partitioning Workload control Deep compression

Data movement and transformation Database management IBM DB2 Warehouse

Administration and control

IBM Software Group

DWE OLAP Model


Cube Cube dimension Cube hierarchy Cube Level Cube Facts

Cube Model Dimension Hierarchy Facts Join Attribute Level


Join Join Attribute Attribute

Measure Measure

dimension tables fact table

dimension tables

Relational tables in DB2


5

IBM Software Group

Model-Based Optimization
Administrator Model
OLAP Metadata

Catalog Tables

Base Tables

Model Information Time & Space constraints Query Types

Statistics Data Samples

Benefits Smart Aggregate Selection Smart Index Selection SQL Generation DB2 Exploitation

MQT's

Performance Advisor
6

IBM Software Group

OLAP Metadata Interchange


meta data bridge
OLAP Metadata

meta data bridge


OLAP Metadata

DB2 Alphablox

MITI

OL AP Metad ata

OLAP Metadata

Hyperion
OL AP Metad ata OL AP Metad ata

RDBMS Metadata
OL AP Metad ata

DML DDL

OLAP Metadata

OL AP Metad ata

DB2 Data Warehouse

DATA
OL AP Metad ata

BUSINESS OBJECTS

QMF for Windows

Model & ETL tool metadata

BI tool metadata

QlikTech ArcPlan

IBM Software Group

Alphablox
Platform for Customized Analytic Applications and Inline Analytics Pre-built components (Blox) for analytic functionality Allows you to create customized analytic components that are embedded into existing business processes and web applications

IBM Software Group

Alphablox
For end-users: A web application, portal or dashboard with embedded analytics in an easy-to-use interactive interface For application developers: A J2EE application for analysisoriented interaction A set of analytic-focused extensions to the application server Alphablox with DWE: SQL generated by DWE Design Studio can be pasted into Alphablox pages for warehousebased embedded analytics
9

IBM Software Group

Alphablox Architecture
Web Browser DHTML Based Client similar to AJAX XMLHttpRequest Alphablox WebLogic WebSphere Tomcat UI Model GridBlox Calculations ChartBlox Bookmarks DataBlox Alerts PresentBlox Comments

OLAP Essbase / MSAS / SAP BW

Alphablox Cubing Engine ROLAP

Relational Databases

MQ

10

IBM Software Group

Relational Cubing Engine & OLAP Optimization


Application Server Tier
Relational Cubing Engine
Relational Cube cubelets Cube Definition Metadata Import
DB2 Cube Views DB2 MQTs Star Schema
OLAP Metadata

Database Server Tier


Dimension Data Retrieval

DB2 Alphablox Server

MDX
Data Blox

MDX

Fact Data Retrieval

DB2 Alphablox Application


Present Blox Grid Blox Chart Blox

Customer Tier

HTTP Server

11

IBM Software Group

Versatile Architecture Support


BI Applications and Tools

Mart

DB2 Warehouse supports versatile analytics architectures Analytics directed against

EDW

External Mart Internal Mart Virtual Mart

External Marts

Internal Marts

Virtual Marts
12

IBM Software Group

IBM DB2 Warehouse Software


Embedded analytics Modeling and design
Data mining and visualization In-line analytics

Performance optimization
Data partitioning Workload control Deep compression

Data movement and transformation Database management IBM DB2 Warehouse

Administration and control

13

IBM Software Group

DWE Easy Mining Mining without a Statistician


Realize the benefits of mining by enabling analysts, rather than relying on statisticians, for your data mining needs

Reporting Tool

DB2 Data Warehouse Edition

14

IBM Software Group

Two Types of Data Mining Discovery & Predictive


Discovery
Automatically find trends and patterns Answer unasked questions Relatively undirected analysis Tool reports on findings In a word Easier Useful for non-statisticians

Predictive
Specific question Probability associated with outcomes Directed analysis Iterative process
Train Test Apply

Apply model in database at customer touch points

15

IBM Software Group

DWE Easy Mining Algorithms


Business Discovery Methods finding useful patterns and relationships Associations Analyst
Which item affinities (rules) are in my data?

DWE

[Beer => Diapers] single transaction


Sequences

Which sequential patterns are in my data? [Love] => [Marriage] => [Baby Products] sequential
Clustering

DWE Enterprise Data Warehouse


Partner
Extracted Information Data Selected Warehouse Data Assimilated Information

Which interesting groups are in my data? customer profiles, store profiles

Predictive Methods predicting values


Classification

How to predict categorical values in my data? will the patient be cured, harmed, unaffected by treatment?
Regression

How to predict numerical values in my data?


Assimilate

Select

Transform Mine

how likely a customer will respond to the promotion how much will each customer spend this year?
Score data directly in DB2, scalable and real time
16

Statistician & Data Mining Workbench

IBM Software Group

How to Recognize a Data Mining Need


What do my customers look like? Which customers should I target in a promotion?
Which products should I use for the promotion?

How should I lay out my new stores? Which products should I replenish in anticipation of a promotion? Which of my customers are most likely to churn? How can I improve customer loyalty? What is the most likely item that a customer will purchase next? Who is most likely to have another heart attack? What is the likelihood of a part failure?
When one part fails, what other part(s) are most likely to fail soon?

How can I identify high-potential prospects (lead generation)? How can I detect potential fraud?
17

IBM Software Group

High Level view of the Data Mining Process


Business Problem

A minor miracle occurs

Validate, Refine Data Warehouse Extract & Transform data Build Model

Insight
Deploy

18

IBM Software Group

The Data Mining Process


This is an iterative process!
Business Problem
Data Warehouse

MINING DEPLOY
Discover & Interpret Information
Apply Results

Revise Data & Refine Model

Select Data

( X = f(X ,Z

j)

Select Transform

Mine Visualize

Report Score data Embed in application

ETL
Data Preparation

Analyze
Understand

Data Mining
19

IBM Software Group

Associations
Discovery technique to find associations or affinities among items (or conditions, outcomes, etc.) in a single transaction.
Constructs statements (rules) that quantify the relationships among items that tend to occur together in transactions

Example:
In a supermarket, Cola is bought in 20% of all purchases. Cola is bought in 60% of the purchases involving Orange juice. 3.7% of all purchases involve both Cola and Orange juice. The rule [ Orange juice ] [ Cola ] has the following properties: Support = 3.7% Cola and OJ are present together in 3.7% of all baskets. Confidence = 60% Cola is present in 60% of the baskets containing OJ. Lift = 60% / 20% = 3 Cola is 3 times as likely to be in the basket when OJ is also.

Scoring
Given the item(s) purchased (rule body), what item (rule head) is most likely to be purchased as well?

Common uses
Promotional or cross-sell offers, Disease management, Part failure
20

IBM Software Group

Sequences
Discovery technique to find affinities among items (or conditions, outcomes, etc.) across multiple transactions over time.
Quantifies relationships (sequences) to identify the most likely item in the next transaction

G, B ----

----

100% of the customers who get C will get X at a later time


B ---A ---Y

67% of the customers who get B will get X at a later time


--B ---X

----

----

Scoring
Given the item(s) purchased previously (rule body), what item (rule head) is most likely to be purchased in a subsequent transaction within a certain time frame?

Common uses
Fraud detection, Promotional offers, Disease management, Part failure
21

IBM Software Group

Clustering
Discovery technique to find clusters having distinct behaviors and characteristics
Gain insights to customers, stores, insurance claims, etc. Generate distinct behavioral/demographic profiles Understand the most important attributes of each cluster

Create a model to assign individuals to best-fit clusters


Apply model to assign new individuals or re-assign existing individuals Design business actions tailored to different characteristic profiles

Scoring
Apply model to assign each record to its best-fit cluster Apply appropriate business action for each record based on its assigned cluster

Common uses
Customer segmentation, store profiling, deviation detection
22

IBM Software Group

Classification
Prediction technique to classify individuals by outcome
Classify by a categorical class variable (e.g., YES-NO-MAYBE response) Understand the most important factors (predictors) leading to each outcome

Modeling
Create a model to classify individuals according to expected outcome Design business action based on most important predictors

Scoring
Apply model to predict the outcome for each individual New prospects (expected behavior) Existing individuals (changes in behavior) Identify target individuals for business action

Common uses
Customer attrition (churn), Part failure
23

IBM Software Group

Regression
Set of predictive techniques to predict a dependent variable
Predict continuous value or binary numeric value Continuous: e.g., revenue (prediction represents amount of revenue) Binary: e.g., 0=No, 1=Yes (prediction represents probability of Yes) Understand the most important predictors of the dependent variable Transform regression, linear regression, polynomial regression

Modeling
Create a model to predict the dependent variable Design business action (e.g., predict likelihood of default for a loan application, in real time)

Scoring
Apply model to generate a prediction for each individual (e.g., probability of part failure) Identify target individuals for business action

Common uses
Predict revenue/cost/profitability, Predict risk of loan default
24

IBM Software Group

The Data Mining Process


This is an iterative process!
Business Problem
Data Warehouse

MINING DEPLOY
Discover & Interpret Information
Apply Results

Revise Data & Refine Model

Select Data

( X = f(X ,Z

j)

Select Transform

Mine Visualize

Report Score data Embed in application

ETL
Data Preparation

Analyze
Understand

Data Mining
25

IBM Software Group

Data exploration
DWE enables you to explore the data.
Check data quality (prior to performing ETL for data preparation) and gain a general understanding of the data

Design Studio provides four tools to inspect data:


Table sampling Univariate distributions Bivariate distributions Multivariate distributions

All these tools are accessible by rightclicking on a table/view/alias/nickname in the database explorer:
-> Data for table sampling/editing -> Value Distributions for multivariate/ univariate/bivariate distributions
26

IBM Software Group

The Data Mining Process


This is an iterative process!
Business Problem
Data Warehouse

MINING DEPLOY
Discover & Interpret Information
Apply Results

Revise Data & Refine Model

Select Data

( X = f(X ,Z

j)

Select Transform

Mine Visualize

Report Score data Embed in application

ETL
Data Preparation

Analyze
Understand

Data Mining
27

IBM Software Group

Leveraging Mining and Alphablox: DWE Miningblox


Create web applications that provide access to DWE Data Mining Extends the DB2 Alphablox API with mining specific functionality. With Miningblox, you can perform the following tasks: Selecting input data Processing input data Displaying mining results graphically in a Web browser, for example, the characteristics of a customer segment Administering or managing mining runs Typically a web application using MiningBlox tags might be integrated in a business application or an intranet portal.

28

IBM Software Group

Why use Miningblox ?


Provide access to Data Mining for a group of business analysts. Create a Miningblox web application that provides access to mining functionality through the Web browser, no need to install software on the Clients machines Analysts can execute mining runs and view results in a customized web application without extensive knowledge about mining software. With the Miningblox Application wizard in the DWE Design Studio, you can easily create Web applications by selecting sample templates or you can extend Alphablox applications with mining functionality.

29

IBM Software Group

Deployment through Alphablox application example


MBA application console

30

IBM Software Group

Deployment through Alphablox application example


MBA execution

31

IBM Software Group

Deployment through Alphablox application example


MBA completion

32

IBM Software Group

Deployment through Alphablox application example


MBA results report

33

IBM Software Group

Case Study: Retail Department Store


Analytics with Data Mining and Alphablox
David Cope EDW Architect Asia Pacific

2007 IBM Corporation

IBM Software Group

Retail Department Store Chain


Business requirements
Perform a data mining POC (really a pilot project) to support the original DWE decision, ensure success, and highlight DWE capabilities for further uptake Define business problem
Boost storewide sales (across other departments) based on womens shoes

Define analytical approach and ETL procedure


Extract all transactions of customers who have purchased womens shoes Transform transactional data into one record per customer, for customer segmentation Perform market basket analysis (MBA) for high-potential customers who have purchased womens shoes

Challenges
Engagement sponsored by IT with limited access to business users (LOB)

35

IBM Software Group

Solution Overview
Prepare data for mining by:
Pulling transactions for womens shoe customers Creating data for customer segmentation

Analytical Dashboard
Alphablox Heat Maps / Other Visualization Data Mining Visualizer/ Alphablox

Use DB2 Mining to perform:


Clustering Identify high-potential customer segments Market Basket Analysis for high-potential segments Identify associated items Identify next-most-likely purchases

Cubing Engine

Data Mining API

Deploy mining results in Alphablox


Integrate data mining information into the dashboard and as part of the guided analysis

Build a dashboard in Alphablox:


Provide critical information and metrics in an Alphablox dashboard to merchandising and marketing. Integrate powerful visualization to make it easier to identify problem areas

DB2 Data Warehouse


Mining Models & Services Clustering Associations & Sequences Scoring Services

36

IBM Software Group

Business Scenario for Mining


Business requirements for POC
Focus on customers who have purchased womens shoes in the past 12 months Boost storewide sales (across other departments) based on womens shoes Increase wallet share from high-potential customers

Business questions to be answered


What do my womens shoes customers look like? Which of these customers should I target in a promotion? Which products should I use for the promotion? Which products should I replenish in anticipation of a promotion? How can I improve customer loyalty? What is the most likely item that a womens shoes customer will purchase next?

37

IBM Software Group

Step 1: Identify High-Potential Shoe Customers

38

IBM Software Group

Result: 16 Distinct Clusters Created

39

IBM Software Group

Cluster 1: Those who Act Like VIPs

Frequent Shoppers VIPs Active Shoppers Respond to Discounts

Big Spenders

High Returns

High Potential Customers!


40

IBM Software Group

Cluster 6: Frequent Good Shoppers

Shop Here 30 days/yr

Above-Avg Purchases

Above-Avg Spending

Respond to Discounts

Average Returns

High Potential Customers!


41

IBM Software Group

Step 2: Identify Associated Items for Clusters 1 & 6


Extracted transactions for those clusters of customers Performed market basket analysis and interpreted results
Associations (items purchased together in one visit)

42

IBM Software Group

Identify Purchased Together for Clusters 1 & 6

43

IBM Software Group

Results: Associations for Clusters 1 & 6

44

IBM Software Group

Step 3: Identify Next Likely Purchase for Clusters 1 & 6


Extracted transactions for those cluster of customers Performed market basket analysis and interpreted results
Sequences (next most likely purchase in a future visit)

45

IBM Software Group

Identify Next Likely Purchases for Clusters 1 & 6

46

IBM Software Group

Results: Sequences for Customers in Clusters 1 & 6

47

IBM Software Group

Results and Future Ideas


Deployment of customer segmentation and MBA
End-user application with Alphablox Create & refresh mining models Identify high-potential customer segments Refresh assignment of each customer to best-fit cluster Target selected customer segments for promotions Batch scoring to identify best offer(s) for each customer/segment Merchandising now has a view of their customers, not just products

Future ideas
Score a customer at checkout register in real time MBA scoring (associations, sequences) Focused MBA scoring for known customers, based on best-fit cluster Make an offer to induce customers to visit other departments before leaving the store

48

IBM Software Group

49

Vous aimerez peut-être aussi