Académique Documents
Professionnel Documents
Culture Documents
By
Dr. Atanu Rakshit
Email: atanu.rakshit@iimrohtak.ac.in
atanu.raks@gmail.com
Business Analytics
Text Book:
Business Intelligence A Managerial Approach by
Efraim Turban, Ramesh Sharda, Dursun Delen and
Devid King, 2/e, Pearson, 2012
Reference Material:
Business Analytics for Manager by Gert H. N.
Laursen and Jesper Thorlund, Wiley, 2010
Business Analytics
Reference Material:
Decision Support and Business Intelligence
Systems by Efraim Turban, Ramesh Sharda and
Dursun Delen, 9/e, Pearson, 2012
Business Intelligence Strategy A Practical Guide
for Achieving BI Excellence by John Boyer, Bill
Frank, Brian Green and Tracy Harris, MC Press,
2010
Business Analytics
Sessions Plan
Business Analytics
Introduction to Data
Warehousing
Learning Objectives
Understand the basic definitions and concepts of
data warehouses
Learn different types of data warehousing
architectures; their comparative advantages and
disadvantages
Describe the processes used in developing and
managing data warehouses
Explain data warehousing operations
Explain the role of data warehouses in decision
support
Learning Objectives
Explain data integration and the extraction,
transformation, and load (ETL) processes
Describe real-time (a.k.a. right-time and/or active)
data warehousing
Understand data warehouse administration and
security issues
Opening Vignette
DirecTV Thrives with Active Data Warehousing
Company background
Problem description
Proposed solution
Results
Answer & discuss the case questions.
DW definition
Characteristics of DW
Data Marts
ODS, EDW, Metadata
DW Framework
DW Architecture & ETL Process
DW Development
DW Issues
Characteristics of DW
Subject oriented
Integrated
Time-variant (time series)
Nonvolatile
Summarized
Not normalized
Metadata
Web based, relational/multi-dimensional
Client/server
Real-time and/or right-time (active)
Data Mart
A departmental data warehouse that stores
only relevant data
Dependent data mart
A subset that is created directly from a data
warehouse
Independent data mart
A small data warehouse designed for a
strategic business unit or a department
DW Framework
DW Architecture
Three-tier architecture
1.
2.
3.
Two-tier architecture
First 2 tiers in three-tier architecture is combined into one
DW Architectures
OLAP Definition
OLAP is implemented in a multi-user client/server
mode and offers consistently rapid response to queries,
regardless of database size and complexity. OLAP
helps the user synthesize enterprise information
through comparative, personalized viewing, as well as
through analysis of historical and projected data in
various "what-if" data model scenarios. This is
achieved through use of an OLAP Server.
19
OLAP Server
An OLAP server is a high-capacity, multi-user data
manipulation engine specifically designed to support
and operate on multi-dimensional data structures.
A multi- dimensional structure is arranged so that
every data item is located and accessed based on the
intersection of the dimension members which define
that item.
The design of the server and the structure of the data
are optimized for rapid ad-hoc information retrieval
in any orientation, as well as for fast, flexible
calculation and transformation of raw data based on
formulaic relationships. 20
OLAP Server
The OLAP Server may either physically stage the
processed multi-dimensional information to deliver
consistent and rapid response times to end users, or it
may populate its data structures in real-time from
relational or other databases, or offer a choice of
both.
Given the current state of technology and the end
user requirement for consistent and rapid response
times, staging the multi-dimensional data in the
OLAP Server is often the preferred method.
21
Multi-dimensional Data
HeyI sold $100M worth of goods
Dimensions: Product, Region, Time
Hierarchical summarization paths
Product
W
S
N
Juice
Cola
Milk
Cream
Toothpaste
Soap
1 2 34 5 6 7
Month
22
Product
Industry
Region
Country
Time
Year
Category
Region
Quarter
Product
City
Office
Month
Day
Week
10
Juice
Cola
Milk
Cream
47
30
12
Product
Date
23
Product
Household
Telecomm
Video
Audio
Europe
Far East
India
Retail Direct
Sales Channel
Special
24
Sales Channel
Region
Country
State
Location Address
Sales Representative
Low-level
Details
25
A Web-based DW Architecture
Web pages
Client
(Web browser)
Internet/
Intranet/
Extranet
Application
Server
Web
Server
Data
warehouse
Alternative DW Architectures
Alternative DW Architectures
Alternative DW Architectures
1.
2.
3.
4.
5.
Packaged
application
Data
warehouse
Legacy
system
Extract
Transform
Cleanse
Load
Data mart
Other internal
applications
ETL
Issues affecting the purchase of ETL tool
Data transformation tools are expensive
Data transformation tools may have a long learning curve
Representation of Data in DW
Dimensional Modeling a retrieval-based system that
supports high-volume query access
Star schema the most commonly used and the simplest style
of dimensional modeling
Contain a fact table surrounded by and connected to several
dimension tables
Fact table contains the descriptive attributes (numerical values)
needed to perform decision analysis and query reporting
Dimension tables contain classification and aggregation information
about the values in the fact table
Multidimensionality
Multidimensionality
The ability to organize, present, and analyze data by
several dimensions, such as sales by region, by product, by
salesperson, and by time (four dimensions)
Multidimensional presentation
Dimensions: products, salespeople, market segments, business units,
geographical locations, distribution channels, country, or industry
Measures: money, sales volume, head count, inventory profit, actual
versus forecast
Time: daily, weekly, monthly, quarterly, or yearly
Snowflake Schema
Dimension
PRODUCT
Dimension
MONTH
Quarter
Brand
M_Name
...
...
...
Fact Table
SALES
Dimension
QUARTER
UnitsSold
Dimension
BRAND
Brand
Dimension
DATE
Date
LineItem
...
...
Q_Name
...
Dimension
GOGRAPHY
Division
Coutry
...
...
...
Dimension
CATEGORY
Category
Fact Table
SALES
...
Dimension
PEOPLE
Dimension
PRODUCT
...
UnitsSold
...
Dimension
PEOPLE
Dimension
STORE
Division
LocID
...
...
Dimension
LOCATION
State
...
Analysis of Data in DW
Online analytical processing (OLAP)
OLAP Activities
Application-Orientation vs.
Subject-Orientation
Subject-Orientation
Application-Orientation
Operational
Database
Loans
Credit
Card
Data
Warehouse
Customer
Vendor
Trust
Savings
Product
Activity
Application Oriented
Used to run business
Detailed data
Current up to date
Isolated Data
Repetitive access
Clerical User
Warehouse (DSS)
Subject Oriented
Used to analyze business
Summarized and refined
Snapshot data
Integrated Data
Ad-hoc access
Knowledge User (Manager)
Data Warehouse
Performance relaxed
Large volumes accessed
at a time(millions)
Mostly Read (Batch
Update)
Redundancy present
Database Size
100
GB - few terabytes
Data Warehouse
Query throughput is the
performance metric
Hundreds of users
Managed by subsets
To summarize ...
OLTP Systems are
used to run a business
OLAP Operations
Slice a subset of a multidimensional array
Dice a slice on more than two dimensions
Drill Down/Up navigating among levels of data
ranging from the most summarized (up) to the most
detailed (down)
Roll Up computing all of the data relationships for
one or more dimensions
Pivot used to change the dimensional orientation
of a report or an ad hoc query-page display
A 3-dimensional
OLAP cube with
slicing
operations
OLAP
Ti
m
Slicing Operations on a
Simple Tree-Dimensional
Data Cube
Sales volumes of
a specific Product
on variable Time
and Region
Geography
Product
Sales volumes of
a specific Region
on variable Time
and Products
Sales volumes of
a specific Time on
variable Region
and Products
Variations of OLAP
ROLAP Engine
Database Layer
Presentation Layer
53
Database Layer
MDDB Engine
54
Presentation Layer
Obtain multi-dimensional
reports from the DSS
Client.
CPUs
Shared
Memory
One hop
Disks
Real-time/Active DW/BI
Enabling real-time data updates for real-time
analysis and real-time decision making is
growing rapidly
Push vs. Pull (of data)
RDW / ADW
Batch
Mini-Batch
Micro-Batch
Real-Time
Description
Source changes
Data is loaded in full Data is loaded
are captured and
or incrementally using incrementally using
accumulated to be
a off-peak window.
intra-day loads.
loaded in intervals.
Source changes
are captured and
immediately
applied to the DW.
Latency
Daily or higher
Hourly or higher
Second(s)
Capture
Filter Query
Filter Query
CDC
CDC
Intialization
Pull
Pull
Push
Target Load
High Impact
Source Load
High Impact
Queries at peak
times necessary
RDW / ADW
Need for real-time data warehousing
Decision Support has become operational
Integrated BI requires closed-loop analytics
The reach and impact of information access for
decision making can affect customer service, SCM,
and beyond.
Traditional hub-and-spoke architecture is difficult to
keep in sync
One huge BW so that data is centralized for BI/BA
tools
Real-time/Active DW at Teradata
Traditional vs Active DW
Environment
The Future of DW
Sourcing
Infrastructure
Real-time DW
Data management practices/technologies
In-memory processing (super-computing)
New DBMS
Advanced analytics
MDM
Master Data Management (MDM)
Master Data Management
Operational versus Analytical Master Data Management
Demystifying Master Data Management
Would You Like Fries With That? And Does CrossSelling Justify Master Data Management?
Data management's top eight stories of 2008
Human resources data analytics brings metrics to
workforce management
3-65
Affecto 2008
Country
Account
SubAccount
Date
Amount
Affecto
NO
505050
500
20080301
KR30.000
Metadata
Company
Country
Account
Sub-Account
Date
Amount
Text
Text
Integer
Integer
Date
Float
nVarchar(50)
Char(2)
Int(6)
Int(3)
Datetime
Decimal
(YYYYMMDD)
Master data
Products
Software
Hardware
CPU
Customers
Affecto OY
Country
Europe
Affecto AS
Norway
Affecto 2008
Affecto
AB
Sweden
PPS
PPS
Admin
Essbase
Essbase
Analysis
Services
DW
Accounts
Entity
Project
Product
Location
Channel
ETL
ERP
EAI
Admin Spreadsheet
Dynamics
ERP
SAP
Custom
Review
Spreadsheet
Business
User
Affecto 2008
IT Admin
E-Mail
PPS
Essbase
Essbase
Analysis
Services
DW
Accounts
Entity
Project
Product
Location
Channel
Dynamics
Business
User
Affecto 2008
MDM
ERP
ETL
EAI
SAP
Custom
Compliance
International accounting standard
Transparency and auditability
Affecto 2008
DW Implementation Issues
Tasks for successful DW implementation
Establishment of service-level agreements and data-refresh
requirements
Identification of data sources and their governance policies
Data quality planning
Data model design
ETL tool selection
Relational database software and platform selection
Data transport
Data conversion
End-user support
DW Implementation Guidelines
Successful DW Implementation
Things to Avoid
Starting with the wrong sponsorship chain
Setting expectations that you cannot meet
Engaging in politically naive behavior
Loading the data warehouse with information just
because it is available
Believing that data warehousing database design is
the same as transactional database design
Choosing a data warehouse manager who is
technology oriented rather than user oriented
Unrealistic expectations
Inappropriate architecture
Low data quality / missing information
Loading data just because it is available
Q&A