Académique Documents
Professionnel Documents
Culture Documents
platform for
the Data Lake
How the KAVE offers the relevant technology to cover the use cases of a
Data Lake solution
Table of contents
• Open Source for Analytics: an established yet evolving trend
• Data Lakes
© 2017 KPMG N.V., a Dutch limited liability company, is a member firm of the KPMG network of independent member firms affiliated with KPMG International Cooperative (‘KPMG International’), a Swiss entity. All rights reserved. 2
Open Source for
Analytics: an
established yet
evolving trend
The most successful Fortune 500
companies run and grow on Open Source
Data&Analytics software
© 2017 KPMG N.V., a Dutch limited liability company, is a member firm of the KPMG network of independent member firms affiliated with KPMG International Cooperative (‘KPMG International’), a Swiss entity. All rights reserved. 4
OpenSource technology is empowering:
Processing 510,000 comment postings, 293,000
status updates, 136,000 photo uploads at FB, per
second
© 2017 KPMG N.V., a Dutch limited liability company, is a member firm of the KPMG network of independent member firms affiliated with KPMG International Cooperative (‘KPMG International’), a Swiss entity. All rights reserved. 6
Google has released
over 900 open source
projects, totaling over
20M code lines.
Developer time spent
on open source
amounts to about 1B$
worth of salaries per
year
© 2017 KPMG N.V., a Dutch limited liability company, is a member firm of the KPMG network of independent member firms affiliated with KPMG International Cooperative (‘KPMG International’), a Swiss entity. All rights reserved. 7
Barclays claimed to
have cut costs up
to 90% in the last
five years by
adopting
opensource for its
cloud strategy
Closed-source D&A:
drawbacks & risks
Less efficiency &
flexibility for data
exploration: analysts’
tools & techniques are
too different
© 2017 KPMG N.V., a Dutch limited liability company, is a member firm of the KPMG network of independent member firms affiliated with KPMG International Cooperative (‘KPMG International’), a Swiss entity. All rights reserved. 10
Data analytics software
usage & relevance growth:
Open vs proprietary
300,00%
250,00%
200,00%
150,00%
100,00%
50,00%
0,00%
KDnuggets 2016 Software Poll
Open source Proprietary (also partially)
© 2017 KPMG N.V., a Dutch limited liability company, is a member firm of the KPMG network of independent member firms affiliated with KPMG International Cooperative (‘KPMG International’), a Swiss entity. All rights reserved. 11
Lock-in solution: cannot
easily integrate,
customize and migrate
© 2017 KPMG N.V., a Dutch limited liability company, is a member firm of the KPMG network of independent member firms affiliated with KPMG International Cooperative (‘KPMG International’), a Swiss entity. All rights reserved. 12
Federal Source Code Policy: 20% minimum of newly
developed software released as open source:
encourage usability, prevent lock-in
© 2017 KPMG N.V., a Dutch limited liability company, is a member firm of the KPMG network of independent member firms affiliated with KPMG International Cooperative (‘KPMG International’), a Swiss entity. All rights reserved. 13
Fall-behind: cannot
introduce state-of-art
techniques or
redesign
© 2017 KPMG N.V., a Dutch limited liability company, is a member firm of the KPMG network of independent member firms affiliated with KPMG International Cooperative (‘KPMG International’), a Swiss entity. All rights reserved. 14
In 2008 Nokia very successfully open-
sourced its SymbianOS handset
system, with an expense about 300M $
© 2017 KPMG N.V., a Dutch limited liability company, is a member firm of the KPMG network of independent member firms affiliated with KPMG International Cooperative (‘KPMG International’), a Swiss entity. All rights reserved. 17
Data Lakes
“A Data Lake is a centralized,
integrated and large-scale data
repository for the organization.
The Data Lake empowers a
pan-organizational and holistic
view on the information.
It collects all of the relevant
organizational data assets with
a structure-oblivious approach.”
© 2017 KPMG N.V., a Dutch limited liability company, is a member firm of the KPMG network of independent member firms affiliated with KPMG International Cooperative (‘KPMG International’), a Swiss entity. All rights reserved. 19
The Data Cycle
© 2017 KPMG N.V., a Dutch limited liability company, is a member firm of the KPMG network of independent member firms affiliated with KPMG International Cooperative (‘KPMG International’), a Swiss entity. All rights reserved. 20
The Data Lake: driving the analytics evolution
TRANSFORMATION
TARGET
PEOPLE
TECHNOLOGY
&
COLLABORATION IMPACT
© 2017 KPMG N.V., a Dutch limited liability company, is a member firm of the KPMG network of independent member firms affiliated with KPMG International Cooperative (‘KPMG International’), a Swiss entity. All rights reserved. 21
The Data Lake: analytics processes & new strategies
Predictive
Risk analysis Data-driven
maintenance
decision
making
Inventory &
Pricing Chain
models intelligence
Optimized Data
marketing integration
© 2017 KPMG N.V., a Dutch limited liability company, is a member firm of the KPMG network of independent member firms affiliated with KPMG International Cooperative (‘KPMG International’), a Swiss entity. All rights reserved. 22
Data: not a by-product but a source of value
© 2017 KPMG N.V., a Dutch limited liability company, is a member firm of the KPMG network of independent member firms affiliated with KPMG International Cooperative (‘KPMG International’), a Swiss entity. All rights reserved. 23
Enterprise Data Lake: analytics-driven organization
© 2017 KPMG N.V., a Dutch limited liability company, is a member firm of the KPMG network of independent member firms affiliated with KPMG International Cooperative (‘KPMG International’), a Swiss entity. All rights reserved. 24
Data analytics: make value out of data
Collect &
Integrate
Data-on-demand,
agile access
Frameworks for
data
exploration,
proof-of-
concept’s,
production
© 2017 KPMG N.V., a Dutch limited liability company, is a member firm of the KPMG network of independent member firms affiliated with KPMG International Cooperative (‘KPMG International’), a Swiss entity. All rights reserved. 25
Focus: people
Not just tech-trend,
real value for CIOs Enhanced customer
experience, ad-hoc
Main reference for scenarios
CDOs
Comply to the
Valorize your
organization
analysts team,
structure with
attract new talent
respect to data
© 2017 KPMG N.V., a Dutch limited liability company, is a member firm of the KPMG network of independent member firms affiliated with KPMG International Cooperative (‘KPMG International’), a Swiss entity. All rights reserved. 26
What is the KAVE?
KAVE: extension of the HortonWorks Hadoop distribution
KAVE extension
HortonWorks
Data Platform
distribution
Hadoop core
software
© 2017 KPMG N.V., a Dutch limited liability company, is a member firm of the KPMG network of independent member firms affiliated with KPMG International Cooperative (‘KPMG International’), a Swiss entity. All rights reserved. 28
Data Lakes established technology ecosystem: Hadoop
© 2017 KPMG N.V., a Dutch limited liability company, is a member firm of the KPMG network of independent member firms affiliated with KPMG International Cooperative (‘KPMG International’), a Swiss entity. All rights reserved. 29
Data Lakes established technology ecosystem: Hadoop
Modern
architecture &
De-facto industry service
standard for Big Data
Opensource:
• Free, no license cost
• OK commercial products
• Customizable - no lock-in
• Professional support
© 2017 KPMG N.V., a Dutch limited liability company, is a member firm of the KPMG network of independent member firms affiliated with KPMG International Cooperative (‘KPMG International’), a Swiss entity. All rights reserved. 30
KAVE: extension of the HortonWorks Hadoop distribution
Hortonworks Data Platform
distribution:
• Standard installation, partially
automated
• Additional software
(management, monitoring,…)
• Vendor solution: global tech
support
© 2017 KPMG N.V., a Dutch limited liability company, is a member firm of the KPMG network of independent member firms affiliated with KPMG International Cooperative (‘KPMG International’), a Swiss entity. All rights reserved. 31
KAVE: extension of the HortonWorks Hadoop distribution
Data
Collaboration & BI/visualization
exploration & Web interfaces
Development integration
analysis
Integrated security
layer
Automated installation
on Microsoft Azure
© 2017 KPMG N.V., a Dutch limited liability company, is a member firm of the KPMG network of independent member firms affiliated with KPMG International Cooperative (‘KPMG International’), a Swiss entity. All rights reserved. 32
KAVE: extension of the HortonWorks Hadoop distribution
&
Development
BI
© 2017 KPMG N.V., a Dutch limited liability company, is a member firm of the KPMG network of independent member firms affiliated with KPMG International Cooperative (‘KPMG International’), a Swiss entity. All rights reserved. 33
KAVE: extension of the HortonWorks Hadoop distribution
Continuous
improvement, up-to-
date with Data Lake &
Analytics technology
© 2017 KPMG N.V., a Dutch limited liability company, is a member firm of the KPMG network of independent member firms affiliated with KPMG International Cooperative (‘KPMG International’), a Swiss entity. All rights reserved. 34
KAVE & the
fulfil ment of the
Data Lake evolution
Enterprise Data Lake: topics & directions
Security & Compliance
ETL
DWH/BI functionality
Modern cloud
deployment
© 2017 KPMG N.V., a Dutch limited liability company, is a member firm of the KPMG network of independent member firms affiliated with KPMG International Cooperative (‘KPMG International’), a Swiss entity. All rights reserved. 36
Data Warehouse &
Business Intelligence
functionalities
The traditional DWH/BI stack
ETL
© 2017 KPMG N.V., a Dutch limited liability company, is a member firm of the KPMG network of independent member firms affiliated with KPMG International Cooperative (‘KPMG International’), a Swiss entity. All rights reserved. 38
Traditional DWH/BI stack: capacity scale
• Costs ?
• Performance ?
• SQL-only ?
SQL
SQL SQL
© 2017 KPMG N.V., a Dutch limited liability company, is a member firm of the KPMG network of independent member firms affiliated with KPMG International Cooperative (‘KPMG International’), a Swiss entity. All rights reserved. 39
Enterprise Data Lake: ELT scaling in KAVE
T L
L
DWH
T T
T
T
T
T T
T
T
T
T
© 2017 KPMG N.V., a Dutch limited liability company, is a member firm of the KPMG network of independent member firms affiliated with KPMG International Cooperative (‘KPMG International’), a Swiss entity. All rights reserved. 40
Enterprise Data Lake: evolution of the DWH/BI stack
© 2017 KPMG N.V., a Dutch limited liability company, is a member firm of the KPMG network of independent member firms affiliated with KPMG International Cooperative (‘KPMG International’), a Swiss entity. All rights reserved. 41
KAVE: fully-automated ETL facilities
© 2017 KPMG N.V., a Dutch limited liability company, is a member firm of the KPMG network of independent member firms affiliated with KPMG International Cooperative (‘KPMG International’), a Swiss entity. All rights reserved. 42
KAVE: fully-automated ETL facilities
Define, schedule and manage ETL pipelines in a Integrated and automatic metadata creation and
graphical way management
Seamless import of
heteogeneous data
sources (logs, queues,
files, webpages…)
Advanced and optimized Hadoop storage formats
© 2017 KPMG N.V., a Dutch limited liability company, is a member firm of the KPMG network of independent member firms affiliated with KPMG International Cooperative (‘KPMG International’), a Swiss entity. All rights reserved. 43
Enterprise Data Lake: OLAP & OLTP workloads
OLAP OLTP
JDBC/ Wrappers
ODBC
OLAP
OLTP
© 2017 KPMG N.V., a Dutch limited liability company, is a member firm of the KPMG network of independent member firms affiliated with KPMG International Cooperative (‘KPMG International’), a Swiss entity. All rights reserved. 44
KAVE: OLAP & OLTP workloads
© 2017 KPMG N.V., a Dutch limited liability company, is a member firm of the KPMG network of independent member firms affiliated with KPMG International Cooperative (‘KPMG International’), a Swiss entity. All rights reserved. 45
BI & BigData: are we there yet?
R&D CRM
LOGISTICS
CRM
R&D
LOGISTICS
© 2017 KPMG N.V., a Dutch limited liability company, is a member firm of the KPMG network of independent member firms affiliated with KPMG International Cooperative (‘KPMG International’), a Swiss entity. All rights reserved. 46
KAVE: reports & visualization
JDBC/ODBC
BI platform
© 2017 KPMG N.V., a Dutch limited liability company, is a member firm of the KPMG network of independent member firms affiliated with KPMG International Cooperative (‘KPMG International’), a Swiss entity. All rights reserved. 47
KAVE: reports & visualizations
© 2017 KPMG N.V., a Dutch limited liability company, is a member firm of the KPMG network of independent member firms affiliated with KPMG International Cooperative (‘KPMG International’), a Swiss entity. All rights reserved. 48
Controlling the
access and usage
of the data
Enterprise Data Lake: security & governance
© 2017 KPMG N.V., a Dutch limited liability company, is a member firm of the KPMG network of independent member firms affiliated with KPMG International Cooperative (‘KPMG International’), a Swiss entity. All rights reserved. 50
Enterprise Data Lake: security & governance
Finance department:
no access to their data
© 2017 KPMG N.V., a Dutch limited liability company, is a member firm of the KPMG network of independent member firms affiliated with KPMG International Cooperative (‘KPMG International’), a Swiss entity. All rights reserved. 51
Enterprise Data Lake: security & governance
Finance department
cannot run Spark on
test cluster
© 2017 KPMG N.V., a Dutch limited liability company, is a member firm of the KPMG network of independent member firms affiliated with KPMG International Cooperative (‘KPMG International’), a Swiss entity. All rights reserved. 52
KAVE: full data management on secured infrastructure
© 2017 KPMG N.V., a Dutch limited liability company, is a member firm of the KPMG network of independent member firms affiliated with KPMG International Cooperative (‘KPMG International’), a Swiss entity. All rights reserved. 53
From experiments
to production
Data-centric software products with cycle optimization
Solution
Productization &
monitoring and
deployment
value measurement
5
6
Business- 3
significant Corrections &
proof-of- improvements
concept 7
2
1
Definition &
Brainstorming &
Consolidation of
exploration on data
needed datasets
© 2017 KPMG N.V., a Dutch limited liability company, is a member firm of the KPMG network of independent member firms affiliated with KPMG International Cooperative (‘KPMG International’), a Swiss entity. All rights reserved. 55
KAVE & data-centric development model: a glance
© 2017 KPMG N.V., a Dutch limited liability company, is a member firm of the KPMG network of independent member firms affiliated with KPMG International Cooperative (‘KPMG International’), a Swiss entity. All rights reserved. 56
Prototype data product deployment for the web
WEB
TRAFFIC
© 2017 KPMG N.V., a Dutch limited liability company, is a member firm of the KPMG network of independent member firms affiliated with KPMG International Cooperative (‘KPMG International’), a Swiss entity. All rights reserved. 57
The modern Cloud
experience
The many unknowns of low-automation infrastructure
IT support: Sustained
slow & old infrastructure Quality of
“open a ticket” investment service,
title
model service-level
agreements
Security
Process
bottleneck: wait title
Premises vs
Compliance &
for IT Cloud
Regulations
dichotomy
Uncertainty in title
Rollout &
operational Releases
costs Modularity &
Isolation
© 2017 KPMG N.V., a Dutch limited liability company, is a member firm of the KPMG network of independent member firms affiliated with KPMG International Cooperative (‘KPMG International’), a Swiss entity. All rights reserved. 59
Satisfying the data product cycle: continuous delivery
AUTOMATED
© 2017 KPMG N.V., a Dutch limited liability company, is a member firm of the KPMG network of independent member firms affiliated with KPMG International Cooperative (‘KPMG International’), a Swiss entity. All rights reserved. 60
The need for an integrated, dynamic and automated
infrastructure
Seamless
across API
Scale-as-
premises interface
you-go
& cloud automation
Direct Accurate
control but costs
intelligent control
self-healing Modularity
© 2017 KPMG N.V., a Dutch limited liability company, is a member firm of the KPMG network of independent member firms affiliated with KPMG International Cooperative (‘KPMG International’), a Swiss entity. All rights reserved. 61
The preferred infrastructure model for KAVE: Azure
© 2017 KPMG N.V., a Dutch limited liability company, is a member firm of the KPMG network of independent member firms affiliated with KPMG International Cooperative (‘KPMG International’), a Swiss entity. All rights reserved. 62
Datacenters PaaS (platform as- Security IaaS (infrastructure
a-service) as-a-service)
© 2017 KPMG N.V., a Dutch limited liability company, is a member firm of the KPMG network of independent member firms affiliated with KPMG International Cooperative (‘KPMG International’), a Swiss entity. All rights reserved. 64
Azure: preferred infrastructure for KAVE
© 2017 KPMG N.V., a Dutch limited liability company, is a member firm of the KPMG network of independent member firms affiliated with KPMG International Cooperative (‘KPMG International’), a Swiss entity. All rights reserved. 65
The idea of Data Lake as a service with KAVE
© 2017 KPMG N.V., a Dutch limited liability company, is a member firm of the KPMG network of independent member firms affiliated with KPMG International Cooperative (‘KPMG International’), a Swiss entity. All rights reserved. 66
Try a fully working KAVE instance on the Azure
marketplace!
© 2017 KPMG N.V., a Dutch limited liability company, is a member firm of the KPMG network of independent member firms affiliated with KPMG International Cooperative (‘KPMG International’), a Swiss entity. All rights reserved. 67
Try a fully working KAVE instance on the Azure
marketplace!
© 2017 KPMG N.V., a Dutch limited liability company, is a member firm of the KPMG network of independent member firms affiliated with KPMG International Cooperative (‘KPMG International’), a Swiss entity. All rights reserved. 68
Open source: contribute & extend to fit your needs!
© 2016 KPMG Advisory N.V., registered with the trade register in the Netherlands under number 33263682, is a member firm of the KPMG network of independent member firms affiliated with KPMG International Cooperative (‘KPMG International’), a Swiss entity. All rights reserved. The KPMG name, logo and ‘cutting through complexity’ are registered trademarks of KPMG International.
© 2017 KPMG N.V., a Dutch limited liability company, is a member firm of the KPMG network of independent member firms affiliated with KPMG International Cooperative (‘KPMG International’), a Swiss entity. All rights reserved. 69
Thanks !