Vous êtes sur la page 1sur 50

OPEN SOURCE DATA WAREHOUSE

/BI-A PRIMER


Webinar session for TechGig.com
Presentor Parthasarathi Doraisamy
Enterprise BIDI Solutions


1
CLOUD --WHAT DOES THIS MEAN?
UC Berkeley RAD Lab definition:

1. The illusion of infinite computing resources available on
demand, thereby eliminating the need for Cloud Computing
users to plan far ahead for provisioning

2. The elimination of an up-front commitment by Cloud users,
thereby allowing companies to start small and increase hardware
resources only when there is an increase in their needs; and

3. The ability to pay for use of computing resources on a short term
basis as needed (e.g., processors by the hour and storage
by the day) and release them as needed, thereby rewarding
conservation by letting machines and storage go when they are
no longer useful.

2
REFERENCES/ACKNOWLEDGEMENT
Talend
Pentaho
Birt-eclipse
Birst
Jaspersoft
Greenplum
ASA ODW model
Gartner research analysis
TDWI

3
WHAT IS OPEN DW/BI?
Beware:Open doesnt means the product(s) are free!!!!!!!!

Open DW consists of pre designed,prebuilt Data warehouse architecture which
comes free

Thereby it reduces overall cost and risk by reducing design,development and
implementation time

-> Reduces consumers initial development cost(DQ,ETL,BI & Analytics etc.)

But the vendors charge for the related services in maintainig the DW
solution,further customizing to their exact business need ,Support &
maintenance of the system.

Mitigates the risk through Rapid development

There are technical, social, and economic reasons that will move data
warehousing and, perhaps all data models toward open solutions
4
NEED FOR OPEN DW/BI
Open data warehouse,BI development
progressed rapidly over the past few years due
to compelling economic downturn
Faster deployment need of the proposed
solution due to dynamic business changes
Now a days we can getOpen Source product
for almost every aspect of the BI/Data
warehouse stack including architectures which
are picking up pace.(Few noticable players
Talend,Pentaho,Jaspersoft,Birst .Qlikview etc.)




5
INDUSTRY STATS ON TRADITIONAL DWBI
The average cost of these projects was $2.2
million ($3.1 million today, adjusted for inflation).
The average payback period was 2.3 years,
with over 30% experiencing a 5+ year payback
period.
The majority of respondents reported that their
data warehouses consumed enormous
resources and remained works in progress for
extended periods of time.
6
NEED FOR OPEN DW/BI .
Popular open source databases which help
in these Open data warehouse are MySql
(and its eco-system of add-ons), Ingres,
EnterpriseDB.
Hardware,software cost considerations are
further reduced by extending the Open
solution in the hosted SaaS environment.

7
ODW MODEL A FRAMEWORK
Open Data Warehouse Model (ODWM)
provides a generic framework for delivering an
Open data warehouse
This generic data warehouse model can be
further fine tuned to specific industry
Domain experts work upon these specific
industry solutions just like in typical proprietary
DW/BI solutions earlier,but differ in certain
critical aspects like pre-design of Open DWBI
architecture data model,Etl design,BI design
for the
concerned industry domains


8
ODW MODEL PRINCIPLE
The Open Datamodel consists of Hundreds of potential dimension tables
with thousands of fields which forms the Foundation

These Open data warehouse are carefully designed to ensure stability of
the DW system and easily facilitates the use of commercial ETL
bridges/connectors

(yet allow for interpretation through aggregation and by other means)

OLAP cubes and data marts can be constructed from the foundation as
required by the business through similar bridges/connectors

These are the potential opportunity for Developers in their respective
technology-ie.ETL,BI & Analytics area to come up with appropriate bridge
solutions to seamlessly develop the entire ODW & BI model into a
functional datamart,Enterprise Data warehouse
















9
ODW MODEL & ITS EXTENSIONS..
They must allow for integration of multiple data
sources of different granularity ;should in some
manner, accommodate slowly changing dimensions
Each of the baseline ODW Db instance model can
further create a range of domain specific(we can call
it a IndustrySlice) packaged solutions.These
package may comprise of DQ,ETL,BI solution as
outlined earlier.
These package solutions comprises of
Host the domain specific ODW solution(s) in the
cloud .
These hosted Open DWBI solutions leads us to the
packaged Data warehouse/BI Appliances



10
OPEN DATAWAREHOUSE/BI APPLIANCE
11
OPEN DWBI APPLIANCES
The Open DWBI Appliance combines and
supports thousands of data warehouses, many
of those with hundreds of millions of records in a
scalable multi-tenant environment.
These appliances got the capablity to generate
complex datamodels, complex algorithms inbuilt
within their query engine
These appliance vendors tie up with Hardware
suppliers to construct the appliance in such a
way for performing to its maximum efficiency
12
OPEN DWBI APPLIANCES
These appliances are designed to power an
on-demand software solution that needs to
support a large number of users
simultaneously and has the ability to quickly
increase capacity
Built on a shared-nothing architecture and no
data is shared across nodes (servers).
Popular appliances are
Nettezza,Greenplum..
13
MULTIPLE APPLIANCES FOR ENTERPRISE NEED
14
DWBI APPLIANCES SALENT FEATURES
High Availability and Failover Support
Designed for operation in a high-availability clustered Open DWBI
environment
Global Cache
Provides superior query performance via its massive-scale
caching capabilities

Simplified software Deployment and Upgrades in Place

Dramatically simplifies its deployment by freeing IT from having to
worry about resolving potentially complex OS compatibility issues,
library dependencies or undesirable interactions with other
applications.


15
DWBI APPLIANCES SALENT FEATURES.
Advanced ETL Services and a complete
analytical data warehouse with automated
warehouse generation
Cloud Connectors, for connecting to operational
cloud applications- Eg.Salesforce.com,Google
Analytics
These Connecters allow for automatic uploading
of data into the appliance from various sources
Live Access, which allows you to analyze data
from on-premise data
warehouseswithout uploading

16
SAAS BASED OPEN BI SOLUTION
17
SAAS OPEN BI SOLUTION..
Low-cost, open source solution.
End-to-end, integrated BI and ETL
capabilities.
Full enterprise-level support.
Flexibility of on-demand and on-premise
deployment.
Support for mobile devices as a BI platform.
Support for iterative IT and business-user
report generation process.
18
CLOUD --WHAT DOES THIS MEAN?
Depends upon how you slice it vertically
IaaS -AWS, GoGrid, Mosso
PaaS -Google App Engine, Microsoft Azure
SaaS(BaaS) -Salesforce ,Talend,Jaspersoft,
Pentaho,BIRT etc.
19
AGILE BI-ASTER,CHEAPER,BETTER.
20
CLOUD --WHAT DOES THIS MEAN?
21
ODW -WHEN TO USE THE CLOUD?
Transient application lifespan or use
Quick start required
Budget pressure
Variable use/scale of application unknown
IT unavailable/unresponsive
22
SAAS OPEN DWBI
23
KEY FINDINGS FOR BUSINESS TRANSITION TO
CLOUD TECHNOLOGY(IN 2009)


By 2012, at least 50% of direct commercial revenue attributed to
open-source products or services will come from projects under a
single vendor's patronage.
Through 2011, less than 50% of Global 2000 IT organizations will
have implemented a formal open-source adoption and
management policy as part of an enterprise software asset
management strategy.
Through 2013, 50% of mainstream IT projects using open-source
software (OSS) will not achieve cost savings over closed-source
alternatives.
Through 2013, 90% of market-leading, cloud-computing providers
will depend on OSS to deliver products and services.

24
MOVING TO CLOUD-RECOMMENDATIONS

Expect vendors to play an increasing role in the governance of
many market-leading, open-source solutions during the next
several years.
Move aggressively to establish an effective enterprise adoption
policy, and bring OSS and hardware under asset management
controls.
Do not expect to automatically save money with OSS or any
technology without effective financial management. Do expect to
carefully manage open-source solutions in the appropriate
scenarios to realize total cost of ownership (TCO) advantages.
Manage cloud-based software strategies and open-source
strategies together for maximum effect. Look for synergies
between both, and the ability of OSS to move your workloads to
the cloud.

25
STRATEGIC PLANNING ASSUMPTION(S)

By 2012, at least 50% of direct commercial revenue


attributed to open-source products or services will
come from projects under a single vendor's
patronage.

Through 2011, less than 35% of Global 2000 IT


organizations will have implemented a formal open-
source adoption and management policy.

Through 2013, 50% of mainstream IT projects using


OSS will not achieve cost savings over closed-source
alternatives.
Through 2013, 90% of market-leading, cloud-
computing providers will depend on OSS to deliver
products and services.
26
CLOUD USAGE BY VARIOUS ORGANIZATIONS..
27
OPENSOURCE BI TOOLS
28
TDWI RESEARCH STUDY
29
SAAS BI PROCESS FLOW
30
HARDWARE ACCESS IN CLOUD OPEN DW/BI

Secure access via web,RDC,VPN or combo..
Customized server(Choose ur own
CPU,RAM,Disk space)
Scale up your capacity anytime
Level 2,3 Server support incl 24 * 7
monitoring service
Applicaton support on demand
Integrate with your local & Global IT groups

31
SECURITY ASPECTS IN CLOUD OPEN DW/BI
Web,RDC,VPN or a combo
Firewalls
Certified Data center SAS 70 type II
NDA
Virus protection

32
MDM


MDM success for enterprise open source
DWBI implementation
High quality master data is extremely
valuable to enterprise business
processes and analytics
33
MDM-KEY CONSIDERATIONS
Some key considerations for creating a
master reference data source are outlined
below:
Central master reference data model
Mapping
Populating the master
Publish data
Access and provisioning
Ownership and process
34
MDM CHECKLIST
MDM provides the system in obtaining the
Single version of truth across the various
applications within the enterprise(despite the
disparity of source systems)
The following checklist provides functional
requirements for implementing and deploying
MDM in an enterprise environment :
.
35
MDM CHECKLIST FUNCTIONALITY COVERED
Profiling,
Modeling
Data quality
Data Stewardship & Governance -Hierarchy
management & security
Workflow administration
36
MDM-ACTIVE DATA MODEL .

Multi-Domain capability

Object-Oriented Data Modeling

Domain Templates

Basic Data Validations and Business Rules

Graphical Modeling Tool

Multiple Language Support

37
MDM-DOMAIN INTEGRATION

Complete Data Integration Functionality

Automated Services-Based Integration

Real-Time and Batch Integration

SOA Manager/Console

38
MDM-DQ INTEGRATION WITH ETL,BI

Data Profiling

Accurate Data Match and Merge

Data Bucketing and Blocking

Data Augmentation

Advanced Data Validations and Business Rules

Data Standardization

Data Cleansing

39

MDM-DATA STEWARDSHIP & GOVERNANCE



Hierarchy Management Multiple and Recursive
Hierarchies

Hierarchy Import and Overlays

Business Process Management (BPM) and Workflow


Automated Data Survivorship

Manual Resolution through intuitive GUI interface

40
MDM-ADMINSITRATION

Historical Views of Hub Data

Hub Versioning

Master Data Audit Trail Information

Roles-Based Security and Active Directory Integration


Versioning

41
TALEND MDM SOLUTION OS PRODUCTS
IBM Eclipse; JBoss Application Server and Portal;
eXist Open database;
XSD / XML Schema for the XML data models;
XSLT for data transformation;
Object programming following the EJB 2.1 standards
("Enterprise Java Beans") on Jboss server
XQuery for queries on XML database;
Document/literal WSI norm ("Web Service
Interoperability") for web services
Bonita for business process management.
42
COST COMPARISION
43
Eg: Total cost for a small project, comparing the use of 3 approaches to
data integration: opensource, proprietary and manual coding
SUMMARISED COST-SMALL ETL PROJECT
44
SUMMARY COST FOR MEDIUM ETL PROJECT
45
ODW /BI --WHY IT WILL SUCCEED IN MARKET
ODW/BI has got lot of winner(financial) groups..
Owners get low cost rapid entry into a data
warehouses they can extend.
Developers get to create/sell new ETL/BI products in
a new market(Tool providers)
Source vendors can solve reporting problems and
advance new ways to compete(Source providers)
Consultants get a bigger market for their services
(Service providers).
Domain exerts can participate by creating new open
data warehouses using their deep industry
knowledge (Service providers).
46
ODW /BI --WHY IT WILL SUCCEED IN MARKET
Development licenses
Training curve
Development time
Run-time licenses
Deployment of hardware and operating
system licenses
IT operations
47
ODW /BI --WHY IT WILL SUCCEED IN MARKET
Maintenance/subscription
Maintenance time
Reliability and predictability of the data
integration processes

48
QUESTIONS?
Any questions,please get in touch with me at

Partha.dorai@ebidisolutions.com

Skype -ebidisolutions


49
Thank You!
50