Académique Documents
Professionnel Documents
Culture Documents
Action Action
r
Insight
Business Value
Ad Hoc Analysis
Discovering previously unknown and unsuspected information. Empowering analysts to test hypotheses for better decision making. Query and OLAP
Reports
Performance optimization
Data partitioning Workload control Deep compression
Performance optimization
Data partitioning Workload control Deep compression
Measure Measure
dimension tables
Model-Based Optimization
Administrator Model
OLAP Metadata
Catalog Tables
Base Tables
Benefits Smart Aggregate Selection Smart Index Selection SQL Generation DB2 Exploitation
MQT's
Performance Advisor
6
DB2 Alphablox
MITI
OL AP Metad ata
OLAP Metadata
Hyperion
OL AP Metad ata OL AP Metad ata
RDBMS Metadata
OL AP Metad ata
DML DDL
OLAP Metadata
OL AP Metad ata
DATA
OL AP Metad ata
BUSINESS OBJECTS
BI tool metadata
QlikTech ArcPlan
Alphablox
Platform for Customized Analytic Applications and Inline Analytics Pre-built components (Blox) for analytic functionality Allows you to create customized analytic components that are embedded into existing business processes and web applications
Alphablox
For end-users: A web application, portal or dashboard with embedded analytics in an easy-to-use interactive interface For application developers: A J2EE application for analysisoriented interaction A set of analytic-focused extensions to the application server Alphablox with DWE: SQL generated by DWE Design Studio can be pasted into Alphablox pages for warehousebased embedded analytics
9
Alphablox Architecture
Web Browser DHTML Based Client similar to AJAX XMLHttpRequest Alphablox WebLogic WebSphere Tomcat UI Model GridBlox Calculations ChartBlox Bookmarks DataBlox Alerts PresentBlox Comments
Relational Databases
MQ
10
MDX
Data Blox
MDX
Customer Tier
HTTP Server
11
Mart
EDW
External Marts
Internal Marts
Virtual Marts
12
Performance optimization
Data partitioning Workload control Deep compression
13
Reporting Tool
14
Predictive
Specific question Probability associated with outcomes Directed analysis Iterative process
Train Test Apply
15
DWE
Which sequential patterns are in my data? [Love] => [Marriage] => [Baby Products] sequential
Clustering
How to predict categorical values in my data? will the patient be cured, harmed, unaffected by treatment?
Regression
Select
Transform Mine
how likely a customer will respond to the promotion how much will each customer spend this year?
Score data directly in DB2, scalable and real time
16
How should I lay out my new stores? Which products should I replenish in anticipation of a promotion? Which of my customers are most likely to churn? How can I improve customer loyalty? What is the most likely item that a customer will purchase next? Who is most likely to have another heart attack? What is the likelihood of a part failure?
When one part fails, what other part(s) are most likely to fail soon?
How can I identify high-potential prospects (lead generation)? How can I detect potential fraud?
17
Validate, Refine Data Warehouse Extract & Transform data Build Model
Insight
Deploy
18
MINING DEPLOY
Discover & Interpret Information
Apply Results
Select Data
( X = f(X ,Z
j)
Select Transform
Mine Visualize
ETL
Data Preparation
Analyze
Understand
Data Mining
19
Associations
Discovery technique to find associations or affinities among items (or conditions, outcomes, etc.) in a single transaction.
Constructs statements (rules) that quantify the relationships among items that tend to occur together in transactions
Example:
In a supermarket, Cola is bought in 20% of all purchases. Cola is bought in 60% of the purchases involving Orange juice. 3.7% of all purchases involve both Cola and Orange juice. The rule [ Orange juice ] [ Cola ] has the following properties: Support = 3.7% Cola and OJ are present together in 3.7% of all baskets. Confidence = 60% Cola is present in 60% of the baskets containing OJ. Lift = 60% / 20% = 3 Cola is 3 times as likely to be in the basket when OJ is also.
Scoring
Given the item(s) purchased (rule body), what item (rule head) is most likely to be purchased as well?
Common uses
Promotional or cross-sell offers, Disease management, Part failure
20
Sequences
Discovery technique to find affinities among items (or conditions, outcomes, etc.) across multiple transactions over time.
Quantifies relationships (sequences) to identify the most likely item in the next transaction
G, B ----
----
----
----
Scoring
Given the item(s) purchased previously (rule body), what item (rule head) is most likely to be purchased in a subsequent transaction within a certain time frame?
Common uses
Fraud detection, Promotional offers, Disease management, Part failure
21
Clustering
Discovery technique to find clusters having distinct behaviors and characteristics
Gain insights to customers, stores, insurance claims, etc. Generate distinct behavioral/demographic profiles Understand the most important attributes of each cluster
Scoring
Apply model to assign each record to its best-fit cluster Apply appropriate business action for each record based on its assigned cluster
Common uses
Customer segmentation, store profiling, deviation detection
22
Classification
Prediction technique to classify individuals by outcome
Classify by a categorical class variable (e.g., YES-NO-MAYBE response) Understand the most important factors (predictors) leading to each outcome
Modeling
Create a model to classify individuals according to expected outcome Design business action based on most important predictors
Scoring
Apply model to predict the outcome for each individual New prospects (expected behavior) Existing individuals (changes in behavior) Identify target individuals for business action
Common uses
Customer attrition (churn), Part failure
23
Regression
Set of predictive techniques to predict a dependent variable
Predict continuous value or binary numeric value Continuous: e.g., revenue (prediction represents amount of revenue) Binary: e.g., 0=No, 1=Yes (prediction represents probability of Yes) Understand the most important predictors of the dependent variable Transform regression, linear regression, polynomial regression
Modeling
Create a model to predict the dependent variable Design business action (e.g., predict likelihood of default for a loan application, in real time)
Scoring
Apply model to generate a prediction for each individual (e.g., probability of part failure) Identify target individuals for business action
Common uses
Predict revenue/cost/profitability, Predict risk of loan default
24
MINING DEPLOY
Discover & Interpret Information
Apply Results
Select Data
( X = f(X ,Z
j)
Select Transform
Mine Visualize
ETL
Data Preparation
Analyze
Understand
Data Mining
25
Data exploration
DWE enables you to explore the data.
Check data quality (prior to performing ETL for data preparation) and gain a general understanding of the data
All these tools are accessible by rightclicking on a table/view/alias/nickname in the database explorer:
-> Data for table sampling/editing -> Value Distributions for multivariate/ univariate/bivariate distributions
26
MINING DEPLOY
Discover & Interpret Information
Apply Results
Select Data
( X = f(X ,Z
j)
Select Transform
Mine Visualize
ETL
Data Preparation
Analyze
Understand
Data Mining
27
28
29
30
31
32
33
Challenges
Engagement sponsored by IT with limited access to business users (LOB)
35
Solution Overview
Prepare data for mining by:
Pulling transactions for womens shoe customers Creating data for customer segmentation
Analytical Dashboard
Alphablox Heat Maps / Other Visualization Data Mining Visualizer/ Alphablox
Cubing Engine
36
37
38
39
Big Spenders
High Returns
Above-Avg Purchases
Above-Avg Spending
Respond to Discounts
Average Returns
42
43
44
45
46
47
Future ideas
Score a customer at checkout register in real time MBA scoring (associations, sequences) Focused MBA scoring for known customers, based on best-fit cluster Make an offer to induce customers to visit other departments before leaving the store
48
49