Vous êtes sur la page 1sur 29

BUSINESS INTELLIGENCE

UNIT I Concept of Data Warehousing and Data Mining

Where is the wisdom? Lost in the knowledge. Where is the knowledge? Lost in the information. - TS Eliot Where is the information? Lost in the data. Where is the data? Lost in the database !!!!. - Joe Celko, database consultant/writer

Through 2010, more than 35 percent of the top 5000 global companies will regularly fail to make insightful decisions about significant changes in their business and markets. - Gartner

Business Intelligence is the use of an organizations disparate data to provide meaningful information and analysis to employees, customers, suppliers and partners for more effective decision making. - Business Objects

BENEFITS OF BI
Timely information for decision making. Drill down and data mine large amounts of data. Observe trends and issues. Visualization of Key Performance Indicators. CRM.

BUSINESS INTELLIGENCE APPLICATIONS


Queries. Reporting.
Including charts, graphs etc.

Analysis.
Slice and dice, Drill down, Pivoting.

Mining.
Hidden patterns, Associations, Classification, Prediction.

Knowledge Management?

SOME IMPORTANT IT CONCEPTS


Client-Server Technology. Web applications. RDBMS.

COMMON DATA SOURCES


RDBMS. Spreadsheets. Flat files. XML. Data warehouses. Data marts.

OLTP
Online Transaction Processing. Designed to achieve efficient transactions such as INSERT and UPDATE. Normalization is important for data integrity. Difficult to access data for reporting and BI.

TYPICAL OLTP SCHEMA


Shop Locations Items

LocId LocName 1 Jayanagar 2 MG Road 3 Adyar 4 MG Road

RegionId 1 1 2 3

ItemId 1 2 3 4 TxId 1 2 3 4 5 6

ItemName Dove Soap Pears Soap Old Spice Shaving Iodex Cream

CatId 1 1 2 3 Qty 5 10 2 12 1 1

Product Categories

Transactions

CatId CatName 1 Soaps & Detergents 2 Cosmetics 3 Medicine

ItemId DateTim LocId e 1 1 1 2 3 4 1 2 4 3 2 4

TYPICAL BI QUESTIONS
What is the total sale of Pears soap in the last financial year? Also quarter-wise please. Has the contribution of medicines to the overall sales been increasing in the last 5 years? Is the demand for cosmetics cyclical over a year? Which regions are performing better (and worse). And how does that compare to the last years

STAR SCHEMA
ITEM DIMENSION

TIME DIMENSION

DIM 4

SALES FACT
LOCATION DIMENSION DIM 5

DIM 6

DIMENSIONAL MODELING
Location Dimension

DimId LocId 1001 1 1002 2 1003 3 1004 4

Time Dimension

Region Bangalore Bangalore Chennai Hyderaba d DimI MonthIdMonth Quarte Year d 1001 1 Jan r Q1 2010 1002 2 Feb Q1 2010 1003 7 Jul Q3 2010 1004 11 Nov Q4 2010 DimId 1001 1002 1003 1004 ItemId 1 2 3 4 Item Dove Soap Pears Soap Old Spice Shaving Cream Iodex Category Soaps & Detergents Soaps & Detergents Cosmetics Medicine

Location Jayanagar MG Road Adyar MG Road

Item Dimension

FACT TABLES
Fact tables store business process metrics : should be normally aggregates over dimensions. Sales Fact :
TimeDim 1001 (Jan 2010) 1003 (Jul 2010) 1001 (Jan 2010) ItemDim 1003 (Old Spice) 1002 (Pears Soap) 1002 (Pears Soap) LocDim 1002 (MG Road) 1001 (Jayanagar) 1002 (MG Road) Sales 2343 3143 1231

THE CUBE
Time Dimension
Sub Cube

It em Di me ns io n Location Dimension

DATA WAREHOUSE DEFINITION


A subject-oriented, integrated, timevariant, non-updatable collection of data used in support of management decision-making processes. - W H Inmon

DATA WAREHOUSE DEFINITION


Subject Oriented : Simple and concise collection of data usually related to particular subject that is used for decision making in a company :
Ex : Sales, Product, Supplier, Customer.

Integrated : Data from several resources is cleaned and integrated :

DATA WAREHOUSE DEFINITION


Non-Updatable : Operations are restricted to adding and accessing data. Physically separate from transactional data. Time-Variant : In order to study trends.

DW ARCHITECTURE
Analysis Mining Queries Reports

BI TOOL

DATA WAREHOUSE

ETL
OLTP OTHER DATA SOURCES

ETL PROCESS
SOURCE DATA SOURCE DATA Extraction STAGING AREA Transform/Cleanse/Load SOURCE DATA

DATA WAREHOUSE

DATA CLEANSING
Ensures inconsistent data does not enter the data warehouse:
Missing data nulls, zeros, zero length strings. Wrong formats - dates, telephone numbers. Dangerous default values. Out of range values.

Usually done in a staging area. Data standardization, verification against standard lists. Record matching. Data transformation. Ex 1 and 2 to M and F. Should minimize deletions.

DW DESIGN STEPS
Planning. Requirement study. Problem analysis. Warehouse design. Data integration and testing. Development of data warehouse.

DW DESIGN PROCESS
Choosing business process for modeling. Choosing the fundamental unit of data to be represented. Choosing the dimensions for the data representation. Choosing various measures for fact tables.

DATA MARTS
Subsets of DW. User community-specific (Ex : department specific) data stores. Each department has its own data mart maintained by the departmental IS team. Top-down and Bottoms-up approaches to build data marts.

TWO APPROACHES
Top-Down. Create a data warehouse for the enterprise. If required, create data marts for the departments and load data into these. Bottom-up. Create data marts. Roll the marts up into a cohesive data mart.

The data warehouse is nothing more than the union of all the data marts. - Ralph Kimball.

You can catch all the minnows in the ocean and stack them together and they still do not make a whale. - Bill Inmon.

UNSTRUCTURED DATA : CONTENT


Notes. E-mails. Faxes. Word processing. Spreadsheets. Invoices and other hard-copy output.

UNSTRUCTURED DATA : CONTENT


Content has to usually reside close to the user. Content must follow life cycle procedures. Formats of content. Cannot be easily accessed using SQL.

CONCLUSION

Vous aimerez peut-être aussi